Hi everyone, have you heard about ARKit3?

For those who are wondering what ARKit is, a quick overview. It is a framework that enables AR apps to be compatible with iPhone and iPad, including of course those with their inbuilt LiDAR sensor, and is made by Apple.

When creating an AR app, Apple provides commonly used components, making it easy to create AR apps without the need to build everything from scratch. In other words, it becomes easier to create AR apps.

This time, we respectfully conducted a comparison between Apple’s ARKit3 and our motion capture app Michicon-Plus. Here’s a summary of our findings.


What can you do with ARKit3?

ARKit3 has the following capabilities:

  • People Occlusion
  • Motion Capture
  • Simultaneous Front and Back Camera
  • RealityKit Integration
  • Multiple-face Tracking
  • Collaborative Session (Possible to create AR spaces simultaneously with multiple people)
  • More Robust 3D Object Detection
  • Improved space recognition speed

This time, we will particularly focus on Motion Capture.

In fact, our company developed an app called MICHICON-PLUS in 2019. This app achieves motion capture using only smartphones and tablets, without utilizing ARKit, but rather our proprietary AI engine.

Introducing Michicon-Plus

Michicon-Plus has the same function as ARKit3’s motion capture which is developed by our company NEXT-SYSTEM.
To put it simply, it’s an iOS app where you can capture the movement of your entire body using just the camera of your iPhone or iPad, and then reflect those movements in real-time onto a 3D character.


Furthermore, Michicon-Plus has been developed utilizing our company’s system called VisionPose Single3D, which is an AI skeleton detection system capable of detecting human skeletal information using just a camera.

Therefore, this time, I would like to compare Michicon-Plus and ARKit3 from several perspectives.


Verification

Before presenting the verification results, let me explain the verification method briefly. It will be conducted in the following manner:

1. Record a video of random movements.
2. Display the recorded video on the screen.
3. Displaying the screen on the iPad’s camera.
4. Launch ARKit3 and Michicon-Plus, and record with each of them.

The equipment used is an iPad mini 5th generation (iOS13.1).

検証環境
The verification environment is like this. It’s kind of surreal.

Furthermore, this comparison will focus on the visual appearance when applying 3D models to ARKit3 and VisionPose Single3D.

The appearance can vary significantly depending on the model, so please consider this aspect while viewing the comparison.

Accuracy and Speed Comparison

For the comparison of accuracy and speed, we have prepared two scenarios:

  1. Comparison with the official ARKit3 sample app
  2. Comparing by aligning the 3D model with our company’s virtual character, Michico

For the second verification, we utilized an API called AR Foundation to display VRM characters. Additionally, the mechanism used to animate the robot in AR Foundation differs from that of Michicon.

Comparison using the official ARKit3 sample app

English version will be available shortly.

▼ To download Michicon-Plus app, please click here.

https://apps.apple.com/us/app/michicon-plus/id1468862870


Comparing 3D models

English version will be available shortly.

Here are the feedbacks from our company’s engineers after watching the video.

  • Both have delays, but the difference between ARKit and Michicon is hardly noticeable visually.
  • Michicon’s foot angles appears more natural. Since Michicon doesn’t track foot movements, it might be due to the influence of forward kinematics.
  • Michicon seems to better track facial direction.
  • In ARKit3, the crossing of legs tends to be disrupted when facing forward.
  • In Michicon, when facing backward and raising hands, the face tilts downward.
  • ARKit seems to have better tracking of the upper arms, while Michicon shows minimal movement.
  • ARKit3 gives a stronger sensation of jumping, contributing to an overall sense of dynamism. However, rotational movements appear smoother in Michicon.


Device temperature comparison

Since both ARKit3 and Michicon-Plus perform demanding tasks, it’s natural to be concerned about the increase in device temperature.

As such, I conducted a 10-minute assessment to measure the temperature increase.

端末温度検証

I focused on measuring the temperature around the rear camera area, which is prone to getting the hottest.

温度検証の様子

I compared the temperature increase in two scenarios: one with low battery and the other with the battery fully charged and continuously charging at 100%.

Here’s the result:

Comparison of Device Temperature when using ARKit and Michicon-Plus

When looking at the results, both Michicon-Plus and ARKit3 showed an increase in temperature over time. There was a maximum difference of about 2°C between them, but within a 10-minute timeframe, the difference didn’t seem significant.


Insights

We assigned our engineers the task of analyzing these verification results.

If you’re interested in learning more about ARKit3 technology, you might find this official video helpful. Please click the link below.

https://developer.apple.com/videos/play/wwdc2019/607


Difference in Analysis speed

Regarding the difference in analysis speed, ARKit3 is faster in both detection and analysis, allowing for higher frame rates. While it’s not exactly clear, the processing flow might look something like this.

Please note: This is a prediction.

On the other hand, VisionPose tends to lower the frame rate due to the time it takes to detect 2D coordinates from images.

While waiting for the response of the analysis results, VisionPose continues to receive input every frame. Any input that cannot be received is discarded.

Considering the official announcement that real-time processing is conducted every frame, it is likely that ARKit3 is faster.

However, upon reviewing the verification video, there seems to be little difference in speed, so the truth remains uncertain.


Differences in Accuracy

ARKit3 tends to distort the 3D model when parts of the body are obscured during rotation. Although Michicon Plus can display without distortion even when partially obscured, there are moments of slight head blur when facing backward.

Facing backward may lead to difficulty in calculating head orientation due to the unavailability of facial features. Regarding the ability to maintain stability even when parts of the body are obscured, it’s likely due to the inherent features of VisionPose.

The concerning point is that ARKit3 doesn’t appear to be facing directly forward.

Since different postures may have varying levels of effectiveness, it would be interesting to try out various positions such as sitting, lying down, engaging in sports, and so on.


Differences in functionality

The differences in functionality can be summarized as follows:

  • ARKit3 provides anchor information (e.g., floors, tables), which Michicon lacks.
  • ARKit3 is compatible with iOS 13 and later, while VisionPose can run on devices with iOS 12 and later.
  • ARKit3 cannot perform full-body tracking using the front-facing camera, whereas VisionPose supports this functionality.


Differences in the joints obtained

Comparison of internal engines for skeletal detection is as follows:

ARKit3: 8 points~
VisionPose Single3D: 17 points

Expected to increase to 30 points in the future.

ARKit3 provides 91 points of joint information. Specifically, ARKit3 defines eight named joints, making it easier to access key points.

Definition of ARKit3
ARSkeleton.JointName.root
ARSkeleton.JointName.head
ARSkeleton.JointName.leftFoot
ARSkeleton.JointName.leftHand
ARSkeleton.JointName.leftShoulder
ARSkeleton.JointName.rightFoot
ARSkeleton.JointName.rightHand
ARSkeleton.JointName.rightShoulder

With ARKit3, you can choose and use any preferred key points from a pre-defined set of many keypoints.

On the other hand, the definition of joints for VisionPose Single3D, used in Michicon-Plus, is as follows. Since VisionPose is developed in-house, it’s possible to customize the number of joints in the future.

Definition of Single3D
VisionPose.Keypoint.spineBase
VisionPose.Keypoint.hipRight
VisionPose.Keypoint.kneeRight
VisionPose.Keypoint.ankleRight
VisionPose.Keypoint.hipLeft
VisionPose.Keypoint.kneeLeft
VisionPose.Keypoint.ankleLeft
VisionPose.Keypoint.spineMid
VisionPose.Keypoint.neck
VisionPose.Keypoint.nose
VisionPose.Keypoint.head
VisionPose.Keypoint.shoulderLeft
VisionPose.Keypoint.elbowLeft
VisionPose.Keypoint.wristLeft
VisionPose.Keypoint.shoulderRight
VisionPose.Keypoint.elbowRight
VisionPose.Keypoint.wristRight
VisionPose.Keypoint.spineShoulder

▼Reference

https://developer.apple.com/documentation/arkit/arskeleton/jointname

Conclusion

Based on our engineers’ opinions, in addition to the differences in functionality, it can be summarized that ARKit3 has an advantage in overall processing speed, while Michicon-Plus seems to have an advantage in accuracy (by supporting various postures) as well as various OS and SDK it supports, including Windows and Unity.

▼ Summary of the comparison verification results between ARKit3 and Michicon-Plus


Considering the unique features of each, it would be beneficial to utilize them in future development endeavors.

VisionPose’s website

Connect with us on Twitter (X), Facebook, Website, Youtube, & Linkedin.