Content
summary Summary

4DHumans tracks human pose in videos and can reconstruct their shape in 3D. The team behind it sees many applications and publishes the model.

At the heart of 4DHumans is HMR 2.0, an evolution of an older method (HMR / Human Mesh Recovery) that follows the trend of using transformer architectures for computer vision. HMR 2.0 uses vision transformers and MLPs to track human poses in images, forming the basis of the entire 4DHumans system, which uses this information for 3D human pose and shape reconstruction.

Video: Goel et al.

According to the Berkeley team, the method achieves new highs in video tracking compared to older approaches, and shows impressive performance, particularly in reconstructing unusual poses that were previously difficult to reconstruct, such as in sports. 4DHumans is also able to track multiple people, even when they overlap, as in Olympic wrestling.

Ad
Ad

4DHumans has applications in robotics and biomechanics

The team trained two variants of HMR 2.0, with HMR 2.0b training longer and with more data. This variant produced the best results, and the team plans to release the models soon.

" There is an emerging trend, in computer vision as in natural language processing, of large pretrained models (sometimes also called “foundation models”) which find widespread downstream applications and thus justify the scaling effort. HMR 2.0 is such a large pre-trained model,"

From the paper.

In addition to tracking people in video, the team cites action recognition as a potential application, as well as applications in robotics, computer graphics, biomechanics, and other fields where "analysis of the human figure and its movement from images or videos is needed."

Details on model size or compute used are not yet available. Part of the funding for the project came from StablityAI, the company behind Stable Diffusion.

More details are available on the 4DHumans project page. The code and - as soon as available - the models are available on GitHub.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • 4DHumans can track human bodies in video and render them in 3D, achieving best-in-class performance for unusual poses and multiple people.
  • Potential applications range from robotics and biomechanics to computer graphics.
  • The team plans to release the models soon.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.