Content
summary Summary

Meta has released a new benchmark dataset called HOT3D to advance AI research in the field of 3D hand-object interactions. The dataset contains over one million frames from multiple perspectives.

Ad

The HOT3D dataset from Meta aims to improve the understanding of how humans use their hands to manipulate objects. According to Meta, this remains a key challenge for computer vision research.

The dataset includes over 800 minutes of egocentric video recordings and contains synchronized video recordings from multiple perspectives as well as high-quality 3D pose annotations of hands and objects. It also includes 3D object models with PBR materials, 2D bounding boxes, gaze signals, and 3D scene point clouds from SLAM.

Video: Meta

Ad
Ad

The recordings show 19 subjects interacting with 33 different everyday objects. In addition to simple scenarios where objects are picked up, examined, and set down, the dataset also includes typical actions in kitchen, office, and living room environments.

Two Meta devices were used for data capture: the Project Aria research glasses and the Quest 3 VR headset. Project Aria provides one RGB image and two monochrome images per capture, while Quest 3 provides two monochrome images.

Image: Meta

HOT3D could enable better robots and XR interactions

A core element of the dataset is the precise 3D annotations for hands and objects. These were captured using a marker-based motion capture system. The hand poses are provided in UmeTrack and MANO format, while the object poses are represented as 3D transformations.

HOT3D also includes gaze direction data and head position. | Image: Meta

Additionally, the dataset includes high-quality 3D models of the 33 objects used. These were created with an in-house 3D scanner from Meta and feature detailed geometry as well as PBR materials that allow for photorealistic rendering.

Image: Meta

Meta sees potential for various applications in the dataset: "The HOT3D dataset and benchmark will unlock new opportunities within this research area, such as transferring manual skills from experts to less experienced users or robots, helping an AI assistant to understand user's actions, or enabling new input capabilities for AR/VR users, such as turning any physical surface to a virtual keyboard or any pencil to a multi-functional magic wand."

Recommendation

The dataset is available on Meta's HOT3D project page.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta has released a new benchmark dataset called HOT3D, which contains over one million frames from different perspectives and aims to improve the understanding of how people use their hands to manipulate objects.
  • The dataset includes RGB and monochrome images, 3D pose annotations of hands and objects, 3D object models with PBR materials, 2D bounding boxes, gaze signals, and 3D scene point clouds from SLAM captured by 19 subjects interacting with 33 everyday objects.
  • Meta sees potential for several applications, such as transferring manual skills to robots, helping AI assistants understand user actions, and providing new input options for AR/VR users. The dataset is available on Meta's HOT3D project page.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.