Content
summary Summary

Nvidia researchers have built a small neural network that controls humanoid robots more effectively than specialized systems, even though it uses far fewer resources. The system works with multiple input methods, from VR headsets to motion capture.

Ad

The new system, called HOVER, needs only 1.5 million parameters to handle complex robot movements. For context, typical large language models use hundreds of billions of parameters.

The team trained HOVER in Nvidia's Isaac simulation environment, which speeds up robot movements 10,000 times. According to Nvidia researcher Jim Fan, this means that a full year of training in the virtual space takes just 50 minutes of actual computing time on one GPU.

Small and versatile

HOVER moves zero-shot from simulation to physical robots without the need for fine-tuning, says Fan. The system accepts input from multiple sources, including head and hand tracking from XR devices such as Apple Vision Pro, full-body positions from motion capture or RGB cameras, joint angles from exoskeletons, and standard joystick controls.

Ad
Ad

The Hover model allows a robot to be remotely controlled via a VR headset without any specific fine-tuning. | Video: Nvidia

The system performs better at each control method than systems built specifically for just one type of input. Lead author Tairan He speculates that this may be due to the system's broad understanding of physical concepts such as balance and precise limb control, which it applies across all control types.

The system builds on the open-source H2O & OmniH2O project and works with any humanoid robot that can run in the Isaac simulator. Nvidia has posted examples and code on GitHub.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Nvidia researchers have developed HOVER, a compact neural network with only 1.5 million parameters that can control complex movements of humanoid robots.
  • The system supports multiple control modes, including head and hand movements of XR devices, full-body poses from motion capture or cameras, and joint angles of exoskeletons.
  • HOVER was trained in Nvidia's Isaac GPU-accelerated simulation environment, where one year of intensive training equates to about 50 minutes of real time on a GPU, and can be applied directly to real robots without fine-tuning.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.