High precision AI framework detection system without depth sensor

Developed high precision AI framework detection system “VisionPose” ︎ without depth sensor

We applied joint detection and grouping technology by heat map analysis We are developing a bone detection system “VisionPose”.

It is possible to detect human skeleton and depth information with only WEB camera (RGB camera) by using deep learning without relying on conventional Kinect or another camera with depth sensor.
It can be expected to be used as an alternative to Kinect which was discontinued.
We are currently working on weight reduction (Knowledge Distillation) of models so that they can operate on smartphones (CoreML, etc.), and are trying to achieve light weight and high speed. In the future, we are considering services from the cloud so that even general users can use it.

What is the solution when bone detection accuracy is low? (Annotation tool)

One problem is that if bone joint position is misaligned using VisionPose but insufficient default learning data alone is used. However, if it corresponds to all actions it will be tremendous time and effort, so efficiency is not good.

Therefore, at our company, we develop a tool (annotation tool) that creates learning data of specific actions concurrently with the development of VisionPose. It is possible to create learning data with high accuracy efficiently by creating a mechanism that reinforces learning of specific actions that need detection by pinpoint.
The necessary material is only the animation of motion that wants to raise accuracy. Detect the bone once with VisionPose and fine-tune where the joint is misaligned. By letting VisionPose learn the learned data after the fine adjustment, it is expected to improve the accuracy of the operation that needs bone detection.

Improve learning speed and efficiently learn (Multi-node system)

The more learning data you usually have, the higher the accuracy will be. However, it takes a lot of time to do a lot of learning. Therefore, it is essential to construct a learning system with high efficiency. We are working on improvement of learning speed by constructing multi-GPU · multi-machine (multi-node system) on AWS for problem of learning speed which is born in learning. For customers who do not want to put data on the cloud, we are also considering building multi-GPU / multi-machine locally.

Movie contents

Detect depth of multiple people (distance from the camera)

It is a demonstration movie when multiple people try depth detection. Currently we are developing to further improve speed and accuracy.
* The number of the movie represents the distance from the camera. (In meters)

It can correspond to various sports movements such as baseball, golf, tennis, boxing and so on

It is a demonstration movie when performing various sports actions in VisionPose under development.

For figure skating and gymnastics competitions, judgment is entrusted to the eyes of the referee. However, judgment standards differ slightly depending on individuals, so even if it is a strict examination, it can be more mistake than being a person.

By introducing VisionPose at that time, we can prevent artificial mistakes and expect a fairer judge. Also, by comparing the bone data of professional player’s form and bone data at practice, it is possible to visualize the shift, as sports where base form, baseball, golf, tennis and other important forms are important. By referring to this information, you can use it for various countermeasures, such as which part will be strengthened intensively and will lead to improvement in technology.

It can correspond to various operations at the factory.

It is a demonstration movie when assuming the operation of various factories with VisionPose under development.

There is always danger at the factory site handling many machines. Also, it is not easy to completely eliminate artificial mistakes at zero in monotonous work sites. By using VieionPose, it is possible to improve the quality and crisis management (risk management) by checking the flow and movement of people and letting VisionPose learn the situation.

It can correspond to various movements during meal. Also in medical sites and nursing care facilities.

It is a demonstration movie when assuming behavior of various meal scenes with VisionPose under development.

VisionPose can detect bones even in a sitting state such as a meal scene. For example, it can be used for mini games using the body for rehabilitation and health maintenance. By using it for patients who are excited about rehabilitation and presenting the data in an easy-to-understand manner, we can use it for future plans and expect the user’s willingness to improve.

It can correspond to various operations of the wheelchair.

It is a demonstration movie when assuming the behavior of various wheelchairs in VisionPose under development.

VisionPose can detect bones even when lying on a bed or sitting in a wheelchair, so you can acquire data without burdening the patient. Detecting motion and posture and accumulating data is an important reference material for obtaining hints on how to improve in future development and research in various fields such as medical equipment and wheelchairs I guess.

What are the features of VisionPose?

  1. Overwhelmingly high accuracy. There is less bone shake than similar products.
    By using deep learning to estimate the skeleton, it is possible to measure more accurately than skeleton detection with conventional sensor-equipped cameras.
  2. It also supports depth measurement.
    Since it can extract skeleton information only with WEB camera, it does not depend on depth sensor. You can measure the depth using a stereo camera (2 WEB cameras).
  3. Multiple bone detection is possible without limitation of number of people.
    Processing speed is constant even as the number of people increases because we are collecting multiple people together in one process. Processing speed is higher than bone detection by multiple people than similar products.
  4. Scheduled to be offered at an affordable price than similar products without limitation on usage.
    We are planning to offer it with various licenses depending on the usage.
  5. Excellent usability. Offer with Kinect-like SDK.
    It will be provided at Kinect’s SDK interface. Especially for those who use Kinect, we will offer it in a form easy to program.
  6. Since it does not use infrared rays, it can be used outdoors.
    It is possible to use outdoors which was difficult in Kinect. It is not affected by sunlight.
  7. In the future it will be available on Cloud and smartphone.
    We are doing light weight and speed up of the model so that it can operate even on smartphones.

The only domestic product with excellent precision

It is more accurate than conventional products, and it excels at detecting multiple bones.
You can use it outdoors so you can choose the scene to use. Because it is a domestic product, we can respond to various consultations freely.