Content
summary Summary

Microsoft has introduced a research project that generates and runs Quake II entirely within an AI model, producing a playable version of the game in real time.

Ad

The model, called WHAMM (World and Human Action MaskGIT Model), is part of Microsoft’s Copilot Labs and is designed to explore the capabilities and boundaries of generative AI in interactive media. It builds on an earlier version, WHAM-1.6B, which was trained on the game Bleeding Edge. That model managed only about one frame per second.

WHAMM increases performance significantly, generating over ten frames per second—enough to support real-time interactivity within the model itself. Both WHAMM and WHAM-1.6B are part of Microsoft’s “Muse” model family, which focuses on generative AI tools for game development.

Training with drastically less data

One of WHAMM’s key innovations is its ability to learn from far less data. While WHAM-1.6B was trained on seven years of gameplay, WHAMM required just one week of Quake II gameplay collected from a single level. The dataset, recorded by professional testers, offered targeted and high-quality examples that allowed the model to efficiently learn in-game behavior.

Ad
Ad

WHAMM also adopts a different technical strategy. Instead of using the autoregressive method employed by WHAM-1.6B—where image tokens are generated one at a time—WHAMM implements a MaskGIT strategy. This approach allows the model to generate all image tokens in parallel over several iterations. As a result, generation speed has increased significantly, and the output resolution has doubled, improving from 300 × 180 pixels to 640 × 360 pixels.

Diagram: WHAMM AI system with three phases - image tokenization, world modeling and image refinement through transformer networks.
The WHAMM system works in three stages: First, it converts images into tokens using ViT-VQGAN. Then, a backbone transformer predicts what should happen based on context. Finally, a refinement transformer improves the predicted image tokens through multiple iterations. | Image: Microsoft

WHAMM's architecture consists of two main components. The first is a “backbone” transformer with roughly 500 million parameters, which generates the initial image predictions. The second is a smaller “refinement” module with 250 million parameters that iteratively improves the output. To produce each new frame, the model uses the previous nine image-action pairs as context.

Playable demo highlights current capabilities

The AI-generated version of Quake II—available for testing here—supports core interactions such as moving, jumping, shooting, and placing objects. The simulation also preserves changes made to the environment and allows players to explore hidden sections of the level.

AI-generated gameplay demo. | Video: Microsoft

Although WHAMM supports basic gameplay, it does not fully reproduce the original Quake II. The model generates an approximation of the environment based on a narrow training dataset, which leads to several technical limitations.

Recommendation

Enemy characters appear visually blurred, combat lacks realism, and health indicators are unreliable. Objects disappear from the scene if they remain off-screen for more than 0.9 seconds—the limit of the model’s context window. The playable area is restricted to a single segment of the level, and the simulation freezes once that section ends. Input latency also remains high, with noticeable delays between player input and system response.

Emerging tools for AI-driven game development

WHAMM is part of a broader set of recent initiatives exploring how generative AI can be applied to game development. Other examples include GameGen-O, which focuses on generating open-world simulations, as well as GameNGen and DIAMOND—systems from Google and Deepmind that simulate gameplay for titles such as DOOM and Counter-Strike. While these models represent significant progress, they continue to face technical constraints, including low-resolution output, limited memory, and reduced contextual awareness.

The gaming industry is particularly prone to adopting generative AI because it brings together multiple disciplines—code, design, storytelling, and multimedia—within development cycles that are often constrained by tight budgets and timelines. This combination of creative complexity and resource pressure makes game production especially receptive to tools that can partially automate structured tasks.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Microsoft has introduced WHAMM, an AI model that generates the game Quake II in real-time, allowing for interactive control while running significantly faster than previous models.
  • WHAMM employs a parallel image generation technique and targeted data selection, drastically reducing the amount of training data required.
  • Despite its advancements, WHAMM has limitations such as blurry rendering of enemies and battles, disappearing objects, spatially limited game worlds, and input lag. Microsoft considers WHAMM an experimental foundation for future AI-assisted game development.
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.