Nvidia uses Apple Vision Pro to record humans controlling robots for more realistic training data

Nvidia is making strides in closing the "simulation gap" for humanoid robots by leveraging Apple's Vision Pro headset to gather more realistic training data.

The company recently unveiled Project Gr00t, its AI platform for developing humanoid robots. A key challenge in creating robots suitable for everyday tasks has been the lack of high-quality training data. Nvidia believes it has found a solution by combining human-generated and synthetic data.

Jim Fan, Senior Research Manager and Head of Embodied AI at Nvidia, explained on LinkedIn that the company is using Apple Vision Pro to record sample actions for robots. Humans wearing the headset control robots from a first-person perspective, performing tasks like making toast or retrieving a glass from a cupboard.

"Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data," Fan writes.

Nvidia then uses its RoboCasa simulation framework to multiply this data by a factor of 1,000 or more. The company's MimicGen system further expands the dataset by generating new actions based on the original human data, filtering out unsuccessful attempts.

"This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits," Fan writes.

Video: Nvidia

Using real-world data and scaling it up could help close the so-called reality or sim-to-real gap. This term describes the difficulty of transferring robotic systems trained solely in a simulation to what is usually a much more complex reality.

Jensen Huang has a three-computer problem

At this year's Siggraph conference, Nvidia CEO Jensen Huang explained to Wired reporter Lauren Goode what he termed the "three-computer problem" in robotics development. Huang outlined that the process requires separate computers for creating the AI, simulating it, and running it in the actual robot. He emphasized that this multi-stage approach ensures AI models are thoroughly designed, tested, and optimized before real-world deployment.

Recommendation

AI research

"Cat attack" on reasoning model shows how important context engineering is

RoboCasa is now fully open-source and available at robocasa.ai. MimicGen is also open-source for robotic arms, with a version for humanoids and five-fingered hands in development.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Nvidia uses Apple Vision Pro to record humans controlling robots for more realistic training data

Jensen Huang has a three-computer problem

"Cat attack" on reasoning model shows how important context engineering is

Nvidia positions GR00T N1 to dominate robotics ecosystem

Nvidia researchers demonstrate significant progress in robot training

ORBIT-Surgical uses Nvidia's physics simulation and ray tracing to train surgical robots

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Nvidia uses Apple Vision Pro to record humans controlling robots for more realistic training data

Jensen Huang has a three-computer problem

Share

Bank details