Nvidia researcher Jim Fan expects "GPT-3 moment" for robotics in the next few years

Sep 18, 2024

Sequoia Capital

Key Points

Jim Fan, senior researcher at Nvidia, expects to see significant advances in robotic foundation models in the next two to three years. He compares this to the success of GPT-3 in language processing.
Fan sees great potential for humanoid robots in everyday life, as the world is becoming more human-centered. However, he emphasizes that in addition to the technical aspects, issues of mass production, safety, and regulation must also be addressed.
Nvidia's research group combines data from the Internet, simulations and real robots. It is working on techniques such as "Eureka" to automate robot training and, in the long term, a single model for virtual and physical agents.

Jim Fan, senior researcher at Nvidia, predicts a breakthrough in foundation models in the near future. He sees great potential in humanoid robots for everyday life.

Jim Fan, senior research scientist at Nvidia, expects major advances in robotics over the next two to three years. In an interview with Sequoia Capital, Fan said he hopes for a "GPT-3 moment for robotics" - a breakthrough in foundational robot models comparable to GPT-3's impact on language processing.

Fan leads embodied AI research at Nvidia, where his team is developing Project Groot, the company's effort to create foundation models for humanoid robots.

Research breakthrough in the next two to three years

"I hope that we can see a research breakthrough in robot foundation models maybe in the next two to three years," Fan said. However, he stressed that widespread adoption of robots in daily life will take longer: "To have the robots enter daily lives of people, there are a lot more things than just the technical side. The robots need to be affordable and mass-produced, and we also need safety for the hardware and also privacy and regulations."

Fan sees great potential in humanoid robots: "The world is built around the human embodiment, the human form factor, right? All our restaurants, factories, hospitals, and all equipments and tools - they're designed for the human form and also the human hands."

He believes a capable humanoid robot could, in theory, perform any task a human can do. Fan predicts the ecosystem for humanoid hardware will be ready within two to three years.

Nvidia's approach to developing robot AI combines three data types: internet data, simulation data, and real-world robot data. Fan highlights the strengths and weaknesses of each method, seeing their combination as key to success.

The researcher compares the current state of robotics to natural language processing before GPT-3's breakthrough. He anticipates a similar evolution: from specialized models to a general approach that can later be fine-tuned for specific tasks.

Fan currently views data acquisition as the biggest challenge. "I feel that we have not pushed the limit of Transformers yet," he says. Once the data pipeline is fully developed, the models can be scaled up.

Robot agents train in all worlds

Nvidia is working on techniques like "Eureka," which uses a language model to generate reward functions for robot training, automating a previously manual process.

Beyond the physical world, Fan's team is also researching AI agents for virtual environments like video games. He sees parallels between these domains and aims for a single model that can control both virtual and physical agents in the long term.

"As many intelligent robots as iPhones"

Fan quotes Nvidia CEO Jensen Huang: "Everything that moves will eventually be autonomous." He adds, "If we believe that there will be as many intelligent robots as iPhones then we'd better start building that today."

Despite his optimistic outlook, Fan acknowledges challenges remain. These include integrating fast, unconscious motor control with slower, conscious planning and reasoning processes in a single model.

Before joining Nvidia, Fan interned at OpenAI and completed his PhD under renowned AI researcher Fei-Fei Li at Stanford University.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.