4DHumans tracks human pose in videos and can reconstruct their shape in 3D. The team behind it sees many applications and publishes the model.
At the heart of 4DHumans is HMR 2.0, an evolution of an older method (HMR / Human Mesh Recovery) that follows the trend of using transformer architectures for computer vision. HMR 2.0 uses vision transformers and MLPs to track human poses in images, forming the basis of the entire 4DHumans system, which uses this information for 3D human pose and shape reconstruction.
According to the Berkeley team, the method achieves new highs in video tracking compared to older approaches, and shows impressive performance, particularly in reconstructing unusual poses that were previously difficult to reconstruct, such as in sports. 4DHumans is also able to track multiple people, even when they overlap, as in Olympic wrestling.
4DHumans has applications in robotics and biomechanics
The team trained two variants of HMR 2.0, with HMR 2.0b training longer and with more data. This variant produced the best results, and the team plans to release the models soon.
" There is an emerging trend, in computer vision as in natural language processing, of large pretrained models (sometimes also called “foundation models”) which find widespread downstream applications and thus justify the scaling effort. HMR 2.0 is such a large pre-trained model,"
From the paper.
In addition to tracking people in video, the team cites action recognition as a potential application, as well as applications in robotics, computer graphics, biomechanics, and other fields where "analysis of the human figure and its movement from images or videos is needed."
Details on model size or compute used are not yet available. Part of the funding for the project came from StablityAI, the company behind Stable Diffusion.
More details are available on the 4DHumans project page. The code and - as soon as available - the models are available on GitHub.