Northwestern University engineers have developed a new system for full-body motion capture — and it doesn’t require specialized rooms, expensive equipment, bulky cameras or an array of sensors.
Instead, it requires a simple mobile device.
Called MobilePoser, the new system leverages sensors already embedded within consumer mobile devices, including smartphones, smart watches and wireless earbuds. Using a combination of sensor data, machine learning and physics, MobilePoser accurately tracks a person’s full-body pose and global translation in space in real time.
“Running in real time on mobile devices, MobilePoser achieves state-of-the-art accuracy through advanced machine learning and physics-based optimization, unlocking new possibilities in gaming, fitness and indoor navigation without needing specialized equipment,” said Northwestern’s Karan Ahuja, who led the study. “This technology marks a significant leap toward mobile motion capture, making immersive experiences more accessible and opening doors for innovative applications across various industries.”
Ahuja’s team will unveil MobilePoser on Oct. 15, at the 2024 ACM Symposium on User Interface Software and Technology in Pittsburgh. “MobilePoser: Real-time full-body pose estimation and 3D human translation from IMUs in mobile consumer devices” will take place as a part of a session on “Poses as Input.”
An expert in human-computer interaction, Ahuja is the Lisa Wissner-Slivka and Benjamin Slivka Assistant Professor of Computer Science at Northwestern’s McCormick School of Engineering, where he directs the Sensing, Perception, Interactive Computing and Experience (SPICE) Lab.
Limitations of current systems
Most movie buffs are familiar with motion-capture techniques, which are often revealed in behind-the-scenes footage. To create CGI characters — like Gollum in “Lord of the Rings” or the Na’vi in “Avatar” — actors wear form-fitting suits covered in sensors, as they prowl around specialized rooms. A computer captures the sensor data and then displays the actor’s movements and subtle expressions.
“This is the gold standard of motion capture, but it costs upward of $100,000 to run that setup,” Ahuja said. “We wanted to develop an accessible, democratized version that basically anyone can use with equipment they already have.”
Other motion-sensing systems, like Microsoft Kinect, for example, rely on stationary cameras that view body movements. If a person is within the camera’s field of view, these systems work well. But they are impractical for mobile or on-the-go applications.
Predicting poses
To overcome these limitations, Ahuja’s team turned to inertial measurement units (IMUs), a system that uses a combination of sensors — accelerometers, gyroscopes and magnetometers — to measure a body’s movement and orientation. These sensors already reside within smartphones and other devices, but the fidelity is too low for accurate motion-capture applications. To enhance their performance, Ahuja’s team added a custom-built, multi-stage artificial intelligence (AI) algorithm, which they trained using a publicly available, large dataset of synthesized IMU measurements generated from high-quality motion capture data.
With the sensor data, MobilePoser gains information about acceleration and body orientation. Then, it feeds this data through AI algorithm, which estimates joint positions and joint rotations, walking speed and direction, and contact between the user’s feet and the ground.
Finally, MobilePoser uses a physics-based optimizer to refine the predicted movements to ensure they match real-life body movements. In real life, for example, joints cannot bend backward, and a head cannot rotate 360 degrees. The physics optimizer ensures that captured motions also cannot move in physically impossible ways.
The resulting system has a tracking error of just 8 to 10 centimeters. For comparison, the Microsoft Kinect has a tracking error of 4 to 5 centimeters, assuming the user stays within the camera’s field of view. With MobilePoser, the user has freedom to roam.
“The accuracy is better when a person is wearing more than one device, such as a smartwatch on their wrist plus a smartphone in their pocket,” Ahuja said. “But a key part of the system is that it’s adaptive. Even if you don’t have your watch one day and only have your phone, it can adapt to figure out your full-body pose.”
Potential use cases
While MobilePoser could give gamers more immersive experiences, the new app also presents new possibilities for health and fitness. It goes beyond simply counting steps to enable the user to view their full-body posture, so they can ensure their form is correct when exercising. The new app also could help physicians analyze patients’ mobility, activity level and gait. Ahuja also imagines the technology could be used for indoor navigation — a current weakness for GPS, which only works outdoors.
“Right now, physicians track patient mobility with a step counter,” Ahuja said. “That’s kind of sad, right? Our phones can calculate the temperature in Rome. They know more about the outside world than about our own bodies. We would like phones to become more than just intelligent step counters. A phone should be able to detect different activities, determine your poses and be a more proactive assistant.”
To encourage other researchers to build upon this work, Ahuja’s team has released its pre-trained models, data pre-processing scripts and model training code as open-source software. Ahuja also says the app will soon be available for iPhone, AirPods and Apple Watch.