Google Deepmind's new AI agent plays games using only natural language

Deepmind's SIMA can perform tasks in different video game worlds, such as Valheim or No Man's Sky, using only text prompts.

Google Deepmind researchers introduce SIMA (Scalable Instructable Multiworld Agent), an AI agent for 3D video game environments that can translate natural language instructions into actions.

SIMA was trained and tested in collaboration with eight game studios and across nine different video games, including No Man's Sky, Valheim, and Teardown.

Video: Google Deepmind

The Deepmind team trained SIMA using game recordings in which a player either gave instructions to another player or described their own game. The team then linked these instructions to game actions.

The agent is primarily trained to imitate behavior (behavioral cloning). It imitates the actions performed by the people in the collected data while following the language instructions.

In this way, the agent learns to make connections between the language descriptions, visual impressions, and corresponding actions.

Google Deepmind SIMA uses pre-trained models and learns from humans

The core of the SIMA agent consists of several components that work together to convert visual input (what the agent "sees") and language input (the instructions it receives) into actions (keyboard and mouse commands).

Image and text encoders are responsible for translating the visual and language input into a form that the agent can process. This is done using pre-trained models that already have a comprehensive understanding of images and text.

Recommendation

AI research

Google Deepmind's new PEER architecture uses a million tiny experts to boost AI efficiency

A transformer model integrates the information from the encoders and past actions to form an internal representation of the current state. A special memory mechanism helps the agent to remember previous actions and their results, which is crucial for understanding multi-step tasks.

Finally, the agent uses this state representation to decide which actions to perform next. These actions are keyboard and mouse commands executed in the virtual environment.

SIMA does not require access to the game's source code, only screen images and natural language instructions. The agent interacts with the virtual environment via keyboard and mouse and is therefore potentially compatible with any virtual environment.

SIMA masters 600 skills

In tests, SIMA mastered 600 basic skills such as navigation, object interaction, and menu control. The team expects future agents to be able to perform complex strategic planning and multifaceted tasks.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

SIMA differs from other AI systems for video games in that it takes a broad approach, learning in a variety of environments rather than focusing on one or a few specific tasks.

Research shows that an agent trained in many games performs better than an agent specialized in a single game. In addition, SIMA integrates pre-trained models to take advantage of existing knowledge about language and visual perception, and combines this with specific training data from the 3D environments.

The team hopes that this research will contribute to the development of a new generation of general-purpose, language-driven AI agents. With more sophisticated models, projects like SIMA could one day achieve complex goals and become useful on the Internet and in the real world.

Google Deepmind's new AI agent plays games using only natural language

Google Deepmind SIMA uses pre-trained models and learns from humans

Google Deepmind's new PEER architecture uses a million tiny experts to boost AI efficiency

SIMA masters 600 skills

Isomorphic Labs prepares for its first human trials with drugs designed by AlphaFold

Google Deepmind CEO Demis Hassabi says world models are making progress toward AGI

Deepmind expert says trimming documents improves accuracy despite large context windows

AI coding can make developers slower even if they feel faster

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

"Cat attack" on reasoning model shows how important context engineering is

Google Deepmind's new AI agent plays games using only natural language

Google Deepmind SIMA uses pre-trained models and learns from humans

SIMA masters 600 skills

Share

Bank details