DeepMind has introduced Genie 2, an AI system that transforms single images into interactive 3D environments that users can explore for up to one minute.
The company calls Genie 2 a "Foundation World Model" and says it maintains consistency across the generated spaces. The system, trained on an extensive video dataset, produces environments in 720p resolution that users can navigate with a keyboard and mouse in both first- and third-person views.
According to DeepMind's researchers, Genie 2 simulates core physics, including gravity, collisions, and water movement. The system also manages complex lighting, reflections, and smoke effects.
DeepMind released a video showing these capabilities in action, using an undistilled base model at maximum quality. A distilled version allows real-time interaction, but runs at reduced visual quality.
Video: Google Deepmind
More consistent world generation
A key advancement in Genie 2 is its spatial memory - the system maintains areas that aren't currently visible to the user. When players return to previously visited locations, the environment remains consistent rather than being regenerated, addressing a common limitation of earlier 3D generators.
![Architecture diagram: DeepMind's Genie 2 generates 3D worlds from an image using encoders, decoders and AI actions.](https://the-decoder.com/wp-content/uploads/2024/12/genie_2_architecture-1.png)
DeepMind suggests game developers could use Genie 2 to quickly create test environments from concept sketches or photographs. The system can transform basic drawings into fully realized 3D spaces with working physics and lighting systems.
![Comparison images: concept art and 3D game environments with physics and lighting generated by Genie 2.](https://the-decoder.com/wp-content/uploads/2024/12/genie_2_examples-1.png)
The company also tested Genie 2 with its SIMA AI agent, which responds to natural language commands in digital environments. In one test, SIMA successfully navigated a Genie 2-generated room, following instructions such as "Open the blue door."
![Screenshots: SIMA agent navigating in Genie 2 environment using text commands, e.g.](https://the-decoder.com/wp-content/uploads/2024/12/genie_2_examples-2.png)
This combination of SIMA and Genie 2 could advance the development of AI systems that can perform complex tasks in both digital and physical spaces. SIMA learns from exploring digital training environments, while Genie 2 can generate unlimited training scenarios.
"When we started Genie 1 over two years ago, we always imagined a foundation world model will one day be able to generate an endless curriculum for training embodied AGI. Today, we made a big step towards that future," writes DeepMind researcher Tim Rocktäschel on Bluesky.