Jim Fan, senior researcher at Nvidia, predicts a breakthrough in foundation models in the near future. He sees great potential in humanoid robots for everyday life.
Jim Fan, senior research scientist at Nvidia, expects major advances in robotics over the next two to three years. In an interview with Sequoia Capital, Fan said he hopes for a "GPT-3 moment for robotics" - a breakthrough in foundational robot models comparable to GPT-3's impact on language processing.
Fan leads embodied AI research at Nvidia, where his team is developing Project Groot, the company's effort to create foundation models for humanoid robots.
Research breakthrough in the next two to three years
"I hope that we can see a research breakthrough in robot foundation models maybe in the next two to three years," Fan said. However, he stressed that widespread adoption of robots in daily life will take longer: "To have the robots enter daily lives of people, there are a lot more things than just the technical side. The robots need to be affordable and mass-produced, and we also need safety for the hardware and also privacy and regulations."
Fan sees great potential in humanoid robots: "The world is built around the human embodiment, the human form factor, right? All our restaurants, factories, hospitals, and all equipments and tools - they're designed for the human form and also the human hands."
He believes a capable humanoid robot could, in theory, perform any task a human can do. Fan predicts the ecosystem for humanoid hardware will be ready within two to three years.
Nvidia's approach to developing robot AI combines three data types: internet data, simulation data, and real-world robot data. Fan highlights the strengths and weaknesses of each method, seeing their combination as key to success.
The researcher compares the current state of robotics to natural language processing before GPT-3's breakthrough. He anticipates a similar evolution: from specialized models to a general approach that can later be fine-tuned for specific tasks.
Fan currently views data acquisition as the biggest challenge. "I feel that we have not pushed the limit of Transformers yet," he says. Once the data pipeline is fully developed, the models can be scaled up.
Robot agents train in all worlds
Nvidia is working on techniques like "Eureka," which uses a language model to generate reward functions for robot training, automating a previously manual process.
Beyond the physical world, Fan's team is also researching AI agents for virtual environments like video games. He sees parallels between these domains and aims for a single model that can control both virtual and physical agents in the long term.
"As many intelligent robots as iPhones"
Fan quotes Nvidia CEO Jensen Huang: "Everything that moves will eventually be autonomous." He adds, "If we believe that there will be as many intelligent robots as iPhones then we'd better start building that today."
Despite his optimistic outlook, Fan acknowledges challenges remain. These include integrating fast, unconscious motor control with slower, conscious planning and reasoning processes in a single model.
Before joining Nvidia, Fan interned at OpenAI and completed his PhD under renowned AI researcher Fei-Fei Li at Stanford University.