Content
summary Summary

Nvidia is making strides in closing the "simulation gap" for humanoid robots by leveraging Apple's Vision Pro headset to gather more realistic training data.

Ad

The company recently unveiled Project Gr00t, its AI platform for developing humanoid robots. A key challenge in creating robots suitable for everyday tasks has been the lack of high-quality training data. Nvidia believes it has found a solution by combining human-generated and synthetic data.

Jim Fan, Senior Research Manager and Head of Embodied AI at Nvidia, explained on LinkedIn that the company is using Apple Vision Pro to record sample actions for robots. Humans wearing the headset control robots from a first-person perspective, performing tasks like making toast or retrieving a glass from a cupboard.

Image: Nvidia

"Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data," Fan writes.

Ad
Ad
Image: Nvidia

Nvidia then uses its RoboCasa simulation framework to multiply this data by a factor of 1,000 or more. The company's MimicGen system further expands the dataset by generating new actions based on the original human data, filtering out unsuccessful attempts.

Image: Nvidia

"This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits," Fan writes.

Video: Nvidia

Using real-world data and scaling it up could help close the so-called reality or sim-to-real gap. This term describes the difficulty of transferring robotic systems trained solely in a simulation to what is usually a much more complex reality.

Jensen Huang has a three-computer problem

At this year's Siggraph conference, Nvidia CEO Jensen Huang explained to Wired reporter Lauren Goode what he termed the "three-computer problem" in robotics development. Huang outlined that the process requires separate computers for creating the AI, simulating it, and running it in the actual robot. He emphasized that this multi-stage approach ensures AI models are thoroughly designed, tested, and optimized before real-world deployment.

Recommendation

RoboCasa is now fully open-source and available at robocasa.ai. MimicGen is also open-source for robotic arms, with a version for humanoids and five-fingered hands in development.

 

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Nvidia is using Apple's Vision Pro headset to record human hand movements for real-time robot control, providing a source of realistic training data that can be used to improve the performance of everyday robotics applications.
  • To address the limitations of relying solely on human-generated teleoperation data, Nvidia's Project Gr00t combines this data with synthetic data generated by its RoboCasa simulation framework and MimicGen system, enabling a 1,000-fold increase in the amount of training data available.
  • By expanding training data through a combination of human-generated and synthetic data, Nvidia aims to close the simulation gap for humanoid robots and bridge the reality gap in transferring simulation-trained robots to real-world complexities, ultimately enabling more advanced and reliable robotics applications.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.