Norwegian startup 1X Technologies claims to have made major progress in developing AI-based world models for robots.
According to 1X, these models act as virtual simulators, allowing robot capabilities to be tested and improved across various scenarios without real-world trials.
1X sees this as a potential solution to the "robotics problem" - the challenge of reliably evaluating robots trained for diverse tasks in ever-changing environments.
1X says even identical robot models can show large performance fluctuations within days as environments change, making rigorous real-world evaluation frustratingly difficult.
To train its world models, 1X collected thousands of hours of video footage showing its humanoid EVE robots performing various tasks in homes and offices.
Using machine learning on this data, the models can now plausibly predict how objects and environments will respond to robot actions.
Even for actions not explicitly programmed, the model generates believable video output. For instance, it learns that people and objects should be avoided.
Robots can fold T-shirts - most of the time
According to 1X, the models are already capable of complex physical interactions such as gripping and lifting objects, opening doors and drawers or handling deformable materials such as clothing, for example to fold T-shirts.
1X claims the models can already handle complex physical interactions like gripping objects, opening doors and drawers, or manipulating clothing to fold T-shirts.
The main value of the world model comes from simulating object interactions. In the following generations, we provide the model the same initial frames and three different sets of actions to grasp boxes. In each scenario, the box(es) grasped are lifted and moved in accordance with the motion of the gripper, while the other boxes remain undisturbed.
1X Technologies
1X admits some limitations. The models sometimes struggle to maintain consistent object colors and shapes, or to accurately model physics. Self-recognition in mirrors also remains unreliable.
Still, 1X views these world models as a milestone for developing and training universal robots. To accelerate progress, the startup is offering datasets, pre-trained models and prize money through its "1X World Model Challenge."
World models promise greater efficiency in training
1X's long-term vision is to use world models directly for robot training, potentially enabling huge efficiency gains compared to real-world testing. The company is actively recruiting AI experts to pursue these goals.
Earlier this year, 1X raised $100 million to advance its humanoid household robot Neo toward market launch. The funding, backed by industry leaders like OpenAI, underscores high expectations for 1X's technology.
Besides 1X, Nvidia is heavily invested in humanoid robot development. The company recently unveiled a training approach using Apple Vision Pro. Nvidia researcher Jim Fan, who works on foundation robot models, expects a "GPT-3 moment" for robotics in coming years.