Content
summary Summary

SIMA 2 can understand, plan, perform, and learn from tasks in 3D worlds. The AI is designed to improve without human input and transfer what it learns to new games.

Ad

SIMA 2 is Deepmind's latest AI agent for 3D virtual environments. Unlike its predecessor, SIMA 1, which could only follow simple voice commands, SIMA 2 is built to understand tasks, apply reasoning, and make its own decisions. The upgrade is powered by Deepmind's integration of Gemini, following an approach similar to Nvidia's Voyager, a Minecraft bot that used GPT-4 to learn from gameplay. Gemini, however, is far more capable and multimodal.

The agent navigates complex 3D worlds by analyzing on-screen visuals and simulating keyboard and mouse inputs - all without direct access to internal game data. This makes SIMA 2 an "embodied agent" that interacts with virtual environments much like a human player would.

According to Deepmind, the system can explain its intentions, describe intermediate steps, and respond to follow-up questions - not perfectly, but much more effectively than SIMA 1. The result is a more cooperative and natural interaction that feels less like issuing commands and more like working with a digital partner.

Ad
Ad

How SIMA 2 performs in unfamiliar games

A key goal for SIMA 2 is solving tasks in games it has never encountered before. In tests using the Minecraft-based MineDojo and the recently released game ASKA, SIMA 2 achieved significantly higher success rates than its predecessor. While SIMA 1 struggled with most tasks, SIMA 2 completed 45 to 75 percent in these new games, compared to SIMA 1's 15 to 30 percent.

The system can also generalize abstract concepts - for example, taking what it learned as "harvesting" in one game and applying it as "mining" in another. This level of transfer learning is key for AI systems meant to adapt to new and unfamiliar conditions.

SIMA 2 processes multimodal inputs - such as speech, images, and emojis - and can handle more complex, multi-step instructions. The improved architecture also enables longer, real-time interactions at higher resolutions than before.

Learning through experimentation, not human data

One of the biggest upgrades is SIMA 2's ability to improve itself. It can learn new tasks through trial and error without relying on human training data. The process begins with examples and feedback generated by Gemini. Once that foundation is set, SIMA 2 creates its own training data, evaluates its own performance, and uses that feedback to guide further learning - all autonomously.

This self-improvement ability was also tested alongside Deepmind's Genie 3, a project that generates new 3D worlds from text or image prompts. SIMA 2 managed to adapt even in these unseen, procedurally generated environments and successfully complete tasks.

Recommendation

What still limits SIMA 2

Despite the progress, Deepmind acknowledges that SIMA 2 has limits. The agent still struggles with tasks that require long-term planning or multiple sequential steps. Its memory is also restricted - it can only process a limited amount of contextual information at once.

Simulating mouse and keyboard input remains unreliable for precise control, and visual understanding in complex 3D scenes is still a major challenge. These gaps show just how far current systems remain from truly general-purpose intelligence.

Toward robotics - but not yet practical

Deepmind views SIMA 2 as a step toward physical AI systems capable of operating in the real world. Skills like navigation, tool use, and simple collaboration are seen as core building blocks for future robot assistants. For now, though, SIMA 2 remains a research-only project without any direct commercial applications.

Access to SIMA 2 is limited to a small group of academic and game-industry partners. The goal is to better understand its technical weaknesses and potential risks before wider testing begins.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Deepmind's SIMA 2 is an advanced AI agent that can understand, plan, execute, and learn tasks within 3D virtual environments, improving itself through trial and error without human input and transferring skills to new, unfamiliar games.
  • In tests with games like MineDojo and ASKA, SIMA 2 completed significantly more tasks than its predecessor and demonstrated the ability to generalize concepts across different environments, processing multimodal inputs and handling complex, multi-step instructions.
  • Despite these advances, SIMA 2 still faces challenges with long-term planning, limited memory, and precise control in complex scenes, and remains a research project available only to select partners as Deepmind works to address its limitations before broader use.
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.