Computer vision is one of the hottest topics in the software development industry right now, not only because of its wide fields of application, but because of its ambitious goals: perform image processing by imitating the abilities of the human eye – and surpass them. And one of the most anticipated areas of computer vision is human pose estimation. But what exactly is human pose estimation, and why should you care?

The reason is, that this technology will most likely have a major impact on various fields in your everyday life – imagine virtual personal sports coaches, real-time avatar chats, improved safety of self-driving vehicles and tracking of human movement at public spaces or stores. These are only a few examples of technologies already making their way into our everyday life – and they can all be realized by using human pose estimation.

So let me give you a quick overview of pose estimation in general, specifics on the technology itself, infos on the ways it has been applied already, as well as future prospects of its utilization and a specific example of a pose estimation engine.

So, What is Human Pose Estimation?

Simply put, human pose estimation is a task in computer vision, where keypoints of the human body are detected and associated to describe the pose of a person. These keypoints can be joints, such as the shoulder, elbow, hip or knee, as well as facial points, such as the nose or eyes. The keypoints then are connected to form the human skeleton or body, and by tracking their motion, computers can develop an understanding of the human body language and behavior. This human pose information can be extracted from diverse forms of input, like images, videos or real-time video streams.

There have been different types of approaches (Skeleton-based, Contour-based, Volume-based), but for reason of simplicity, I will focus on the skeleton-based approach in this article. In this approach, pose estimation is performed by detecting keypoints in the human skeleton.

Example image of the keypoints detected with the skeleton-based pose estimation approach

Human pose estimation can be achieved in either 2D or 3D: 2D human pose estimation would for example be used for tracking a persons pose on images or videos, whereas 3D estimation has been a popular approach to retrieve structure information of the human body to build a representation of it, for example for virtual or augmented reality settings.

It is a fascinating and rapidly growing field, but due to the high computational resources needed to track keypoints in live video footage, it has long been lacking the accuracy and precision to be practically applied in applications. It was only through the concept of deep learning that the biggest progress was made, and accuracy as well as sufficiency were increased by miles. One of the first examples of the deep learning-based approach of pose estimation was DeepPose in 2013.

Top-Down vs. Bottom-Up

Human pose estimation itself is already quite challenging task, since the appearance of keypoints changes due to different forms of clothes, occlusion, motion blur, image defocus, changing backgrounds or viewing angles etc., but it gets even more complex when multiple people are being detected at the same time. To tackle these computational complexities during multi-person estimation, there have been two approaches:

Bottom-up methods estimate all body keypoints in the image or video first, and then group them together to form the pose of each person. This method was pioneered by DeepCut, but the most prominent example for this method is OpenPose:

Multi-Person Face/Body/Hand Keypoint Detection with OpenPose

Top-down methods detect the people in an image or video first, and then estimate body keypoints within the detected “person box” to calculate the pose for each person. A famous example for this approach is AlphaPose:

Multi-person pose estimation with AlphaPose (project page: Publications | SJTU Machine Vision and Intelligence Group)

Implementing the top-down approach is actually easier than the bottom-up approach, since implementing a person detection system is simpler than adding grouping algorithms, but it is difficult to say which of these two approaches has a better performance altogether, since it all depends on the overall quality and precision of the implemented technology.

How Has It Been Used in the Past?

So now that we know everything about the technical details, let’s take a deeper look at what you can actually do with human pose estimation: The technology has already been applied in a great variety of fields – including augmented reality, motion analysis, robotics, gaming, and sports. Here are a few examples:

Gaming and Augmented Reality: Human pose estimation has been applied to AR applications, so that players can interact with gaming content without needing any expensive equipment, or putting sensors on their body. In this case, a camera/Webcam connected to the TV or computer detects the keypoints in the body of the player to track their movements and implement them into the game-play. Microsoft’s Kinect for example used 3D human pose estimation to track the motion of players and use it to transfer the actions into the virtual environment for interactive gaming.

Example image of a woman playing an AR Game
Playing AR Games

Sports and Dancing: By utilizing this technology in sport applications, the pose of e.g. pro baseball or golf players has been analyzed to improve performance or teach other players, like for example in the AI sports performance platform “TeamSportz.Pro“, using TensorFlow.js pose estimation. The technology has even been utilized to detect movements that might cause injuries or pain during sports.

In another example, pose estimation has been used in the filed of dance, namely the “Avex Street Dance Certification Test” app published by Avex Management Inc., which uses human pose estimation to evaluate dance skills of users by automatically comparing the movements in videos they take of themself to the original dance video they are trying to copy.

Avex Street Dance Certification Test App Introduction Video

Human Behavior Recognition: Another application of human pose estimation is tracking and measuring human movement and behavior. For this, detected poses are labeled with specific descriptions, e.g. walking, running, sitting, lying, so that the system will be able to identify them. The Intelligent Sensing Laboratory of Newcastle University for example used human pose estimation in this way to detect anomalies in behavior. This type of system has been used in security to identify when a person has fallen down or is sick, and in surveillance to identify suspicious behavior.

These are only a few examples of how the technology has been applied up until now, and I believe you can see my point: human pose estimation has high potential because of its various ways of application, and has an auspicious outlook. So now, lets take a look at what this technology will be used for in the future.

What Will the Future Bring?

There are a myriad of possibilities – think for example about any time where 24/7 human eye surveillance is needed, but hard to implement: security guards in shopping malls, accident prevention surveillance in factories, self-checkout stores or traffic security for autonomous driving. Or think about the latest talk of the metaverse, and how simplified detection of the human pose could help get us all into virtual space. These are all areas where human pose estimation can be utilized, and make everyday life safer, more convenient and fun. In the following I will list a few examples of where the first milestones have been laid, and what we can expect of the future subsequently.

Security and Surveillance Systems: Can security staff or store managers have their eyes everywhere? They cannot. And even commonly used security camera systems are still supervised by people, thus prone to mistakes or oversight. But not the computer: If a human pose estimation system is trained to detect suspicious and unusual behaviour, or the movements when someone falls down or hurts themselves, the system can automatically notify security staff to save the situation. There has already been taken action in creating such systems for factory settings, to prevent accidents and provide staff a safe work environment.

Example image of human pose estimation for a person working in a factory
Image of a Human Pose Estimation Security System in a Factory

Autonomous Driving: This has been a popular topic in industries for years, and you have probably heard of self-driving cars and buses being tested with pilot projects in several cities worldwide. But you might not have heard: driverless taxis are already being tested on the streets in China and the US, with projects such as AutoX and Cruise. At the moment, these types of robotaxis have to be called or reserved by smartphone app, but through implementing pose estimation into the car system, customers could more easily and  spontaneously stop driverless taxis with hand signals or their posture.

Image of a person being pose estimated while trying to hail a taxi by hand sign.
Image of a person being pose estimated while trying to hail a taxi by hand sign

Motion Capture and Virtual Reality: Motion capture and the creation of avatars has been the talk of the time, and human pose estimation is at the core of this hot topic. By using this technology, the human pose can be tracked via camera, and the movements transferred onto a 3DGC character that will mimic the users motion in real-time. The free iOS smartphone application “MICHICON-Plus” for example is a first approach to this.

AI-Powered Full-Body Motion Capture App “MICHICON-Plus”

This is only a small selection of examples, but as you can see, human pose estimation has and will have a big impact on a wide range of fields. Detecting and tracking human pose in real-time will enable computers to develop a more natural understanding of human behavior, and help them to help us.

And to realize these future visions, outstanding technology is needed. I have already mentioned a few examples of pose estimation technologies during the technical explanations earlier, but I would now like to give a short introduction of NEXT-SYSTEM’s pose estimation engine “VisionPose”, which already has been in wide use, and can be applied to any of the fields I mentioned above.

About AI Pose Estimation Engine VisionPose

This human pose estimation engine uses AI technology to detect and analyze up to 30 key points of multiple human skeletons in real-time, without using any markers, sensors, or other special equipment, but only commonly used cameras or webcams. VisionPose comes in form of an SDK, so it can be freely implemented into applications to achieve pose estimation for still images, videos as well as live camera images for whatever goal the developer has, and in any field I already mentioned above, and more.

You can directly test it out on NEXT-SYSTEM’s web demo, by uploading an image or taking a picture and letting the pose estimation AI do their thing: Free VisionPose AI Pose Estimation Demo

And for developers, NEXT-SYSTEM provides a 30-day Free Trial of the SDK, with all functions included: Free Trial Application (Orderless Free Trial) | VisionPose | NEXT-SYSTEM Co.,Ltd.

Find more detailed information on VisionPose, see VisionPose | NEXT-SYSTEM Co., Ltd.

Pose Estimation AI Engine “VisionPose” Image Video