Google researchers show real-time robot control via interactive language

A Google research team demonstrates that up to four visuomotor robotic arms can be precisely controlled in real time using natural language.

Advances in large language models (LLMs) have led to powerful text generators recently. But these are just one of many use cases for natural language processing: combined with other data in multimodal architectures, language understanding helps machines better understand humans without code. Current text-to-x generators illustrate this, and now Google is doing the same with complex voice control of a robotic arm equipped with a video camera.

Interactive language for real-time commands to real-world robots

In the research paper, "Interactive Language: Talking to Robots in Real Time," Google's research team presents a framework for building interactive robots that can be instructed in real-time and in natural language. The robot acts solely based on speech input combined with an RGB image from the camera embedded in the arm (640 x 360 pixels).

The team uses a Transformer-based architecture for language-conditioned visuomotor control, which it trained with imitation learning on a dataset of hundreds of thousands of annotated motion sequences.

According to the researchers, the system can translate more than 87,000 natural language strings into robotic actions in real time with a success rate of about 93.5 percent. This includes complex commands such as "make a smiley face out of blocks" or sorting colors and shapes. The following video shows the model with the speech-controlled robot arm in action.

Interactive human guidance allows the arm to achieve "complex goals with long horizons," the team writes. The human operator gives sequential commands until the robotic arm reaches the target. Commands can be given in different orders and with extensive vocabulary.

The robotic arm can follow a complex series of human instructions until the task is completed. | Image: Lynch et al.

In experiments, the research team also succeeded in controlling four robot arms simultaneously by speech. This shows that the previous assumption of undivided attention of the operator for the correction of online robot behavior can be relaxed, the team writes.

A step towards more useful everyday robots

In particular, the research team sees the open-source language-table dataset with a benchmark for simulated multitask imitation learning as a contribution to human-robot interaction research. According to the researchers, the dataset includes nearly 600,000 simulated and real-world robot motion sequences described with natural language. It is significantly larger than previously available datasets.

However, the researchers write that there are still numerous limitations to human-robot collaboration, such as intention recognition, nonverbal communication, and joint physical execution of tasks by humans and robots. Future research could extend the interactive language approach to useful real-time assistive robots, for example.

Recommendation

AI research

MatterGen: Microsoft presents AI tools for generating and simulating new materials

"We hope that our work can be useful as a basis for future research in capable, helpful robots with visuo-linguo-motor control," the team writes.

Author Pete Florence raised the prospect on Twitter that Google's robotics division will soon share the data, models and simulation environments used with the research community.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Google researchers show real-time robot control via interactive language

Interactive language for real-time commands to real-world robots

A step towards more useful everyday robots

MatterGen: Microsoft presents AI tools for generating and simulating new materials

Hugging Face bets on open source to solve robotics' transparency problem

Nvidia positions GR00T N1 to dominate robotics ecosystem

Figure AI accelerates timeline for household robot launch

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Google researchers show real-time robot control via interactive language

Interactive language for real-time commands to real-world robots

A step towards more useful everyday robots

Share

Bank details