Nvidia is making strides in closing the "simulation gap" for humanoid robots by leveraging Apple's Vision Pro headset to gather more realistic training data.
The company recently unveiled Project Gr00t, its AI platform for developing humanoid robots. A key challenge in creating robots suitable for everyday tasks has been the lack of high-quality training data. Nvidia believes it has found a solution by combining human-generated and synthetic data.
Jim Fan, Senior Research Manager and Head of Embodied AI at Nvidia, explained on LinkedIn that the company is using Apple Vision Pro to record sample actions for robots. Humans wearing the headset control robots from a first-person perspective, performing tasks like making toast or retrieving a glass from a cupboard.
"Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data," Fan writes.
Nvidia then uses its RoboCasa simulation framework to multiply this data by a factor of 1,000 or more. The company's MimicGen system further expands the dataset by generating new actions based on the original human data, filtering out unsuccessful attempts.
"This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits," Fan writes.
Using real-world data and scaling it up could help close the so-called reality or sim-to-real gap. This term describes the difficulty of transferring robotic systems trained solely in a simulation to what is usually a much more complex reality.
Jensen Huang has a three-computer problem
At this year's Siggraph conference, Nvidia CEO Jensen Huang explained to Wired reporter Lauren Goode what he termed the "three-computer problem" in robotics development. Huang outlined that the process requires separate computers for creating the AI, simulating it, and running it in the actual robot. He emphasized that this multi-stage approach ensures AI models are thoroughly designed, tested, and optimized before real-world deployment.
RoboCasa is now fully open-source and available at robocasa.ai. MimicGen is also open-source for robotic arms, with a version for humanoids and five-fingered hands in development.