Figure 01 powered by OpenAI is a robot that can chat, see, plan, and perform tasks

In collaboration with OpenAI, robotics company Figure has developed a robot that can hold full conversations and plan and execute its actions.

This is achieved by connecting the robot to a multimodal model trained by OpenAI that understands images and text.

By connecting to the model, Figure's robot, called "Figure 01," is able to describe its environment, interpret everyday situations, and perform actions based on highly ambiguous, context-dependent requests.

All actions in the video are learned, not remote-controlled, and executed at normal speed.

Corey Lynch, robotics and AI engineer at Figure, is excited about the progress: "Even just a few years ago, I would have thought having a full conversation with a humanoid robot while it plans and carries out its own fully learned behaviors would be something we would have to wait decades to see. Obviously, a lot has changed."

Similar robotics research has already been demonstrated by Google with its RT models, which allow a robot to navigate an everyday environment and plan and execute complex actions based on the input and output of language and image models. But Google's demo robots weren't so chatty.

A robot that listens, plans, thinks, reasons and acts

Lynch describes the robot's capabilities in detail. These include describing its visual experiences, planning future actions, reflecting on its memories, and verbally explaining its conclusions that lead to actions.

To accomplish this, OpenAI's multimodal model processes the entire conversation history, including past images, to generate spoken responses that a human can respond to. The same model also decides which learned behavior the robot should perform to execute a given command.

For example, the robot can correctly answer the question "Can you put that there?" by referring to previous parts of the conversation to determine what is meant by "that" and "put that there." In one example, it understands that the dishes lying around should probably go in the dish rack, something I couldn't figure out until I was over 40.

Recommendation

AI research

Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning

The robot's actions are controlled by what are called visuomotor transformers, which translate images directly into actions. They process the images from the robot's cameras at a frequency of 10 Hz and generate actions with 24 degrees of freedom (wrist positions and finger angles) at a frequency of 200 Hz.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Figure 01 powered by OpenAI is a robot that can chat, see, plan, and perform tasks

A robot that listens, plans, thinks, reasons and acts

Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning

OpenAI takes another shot at building robots

Figure AI unveils advanced humanoid robot likely powered by OpenAI

OpenAI is reportedly rebuilding its robotics division

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Figure 01 powered by OpenAI is a robot that can chat, see, plan, and perform tasks

A robot that listens, plans, thinks, reasons and acts

Share

Bank details