summary Summary

Nvidia Research has developed Eureka, an AI agent that can teach robots complex skills. Eureka combines the latest in generative AI with the classic AI technique of machine reinforcement learning in an accelerated simulation environment.

Eureka has trained ten different robots to perform 29 different tasks in simulations accelerated by a factor of 1000. They can open drawers and cabinets, throw and catch balls, or use scissors. One of the more impressive skills Eureka has taught a robot is to rapidly rotate a pencil in its hand, similar to what some more dexterous humans can do.

The visualization of this ability was done using Nvidia Omniverse. For a human CGI artist, this animation is very complex, according to the researchers involved.

Generative AI writes better reward algorithms than human experts

Eureka autonomously writes reward algorithms to train robots. According to the study published by Nvidia, Eureka-generated reward programs outperform those written by experts in 83 percent of tasks. This results in an average 52 percent improvement in robot performance.


Ten different types of robots learned 29 different tasks in the simulation using reward programs from an AI agent. | Video: Nvidia

Eureka uses OpenAI's GPT-4 to write the reward programs that the robot uses to learn by trial and error. The system does not rely on task-specific prompts from humans or predefined reward patterns.

Using GPU-accelerated simulation in Isaac Gym, Eureka can quickly evaluate the quality of large sets of candidate rewards for more efficient training. Eureka then generates a summary of key statistics from the training results and instructs the LLM to improve the generation of reward functions. In this way, the AI agent independently improves the robot's instructions.

The structure of Eureka. | Image: Nvidia

Nvidia found that the more complex the task, the more the GPT-4's instructions outperformed human instructions from experts called "reward engineers." Researcher Jim Fan, who participated in the study, calls Eureka a "superhuman reward engineer." Fang believes that Eureka will open up new possibilities for controlling robots and creating realistic animations for artists.

Eureka bridges the gap between high-level reasoning (coding) and low-level motor control. It is a “hybrid-gradient architecture”: a black box, inference-only LLM instructs a white box, learnable neural network. The outer loop runs GPT-4 to refine the reward function (gradient-free), while the inner loop runs reinforcement learning to train a robot controller (gradient-based).

Linxi "Jim" Fan, senior research scientist at NVIDIA

In addition, Nvidia says Eureka can integrate human feedback to better tailor rewards to the developer's vision. Nvidia calls this process "in-context RLHF" (contextual learning from human feedback). The system could act as a kind of co-pilot for robot developers, Fan writes.


"The versatility and substantial performance gains of EUREKA suggest that the simple principle of combining large language models with evolutionary algorithms is a general and scalable approach to reward design, an insight that may be generally applicable to difficult, open-ended search problems," the research team writes.

Nvidia is releasing all elements of the Eureka research as open source on Github.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • Nvidia Research has developed an AI agent called Eureka that helps robots learn complex skills such as turning a pen in their hand, opening drawers and cabinets, or throwing and catching balls.
  • Eureka uses OpenAI's GPT-4 to write reward algorithms that reward robots for learning skills through reinforcement learning.
  • Eureka's autonomous reward programs outperform expert-written programs in 83 percent of tasks, resulting in an average performance improvement of 52 percent.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.