Skywork's new open-source model Matrix-Game 2.0 generates interactive AI videos with improved consistency and real-time controls, bringing some of the same breakthroughs recently demonstrated by Google Deepmind's Genie 3.
While Genie 3 was the first to deliver high-quality, coherent interactive videos that last for several minutes, Matrix-Game 2.0 brings similar features to the open-source community. According to Skywork, Matrix-Game 2.0 can generate videos at 25 frames per second, maintain consistent interactions over longer periods, and respond directly to keyboard and mouse input. Users can move through virtual worlds, navigate scenarios, and react to events in real time.
The model supports multiple environments, including cityscapes, wilderness scenes, and Temple Run-style obstacle courses. For training, Skywork used roughly 1,200 hours of interactive video data from Unreal Engine and the open-world game GTA 5.
Video: He at al.
Matrix-Game 2.0 is built on an autoregressive diffusion architecture with 1.8 billion parameters. It predicts future frames based entirely on visual data and user input using a "mouse/keyboard-to-frame" module to feed player actions directly into each frame. This lets the model respond dynamically to movement and control inputs. Technical details and additional demos are available on the project page.
A demo showcases both the strengths and limitations of the model: the environment remains fairly consistent, with visuals that unmistakably evoke GTA 5. Earlier models often struggled with scenes that shifted constantly, but Matrix-Game 2.0 holds the world together more reliably, though it still doesn’t quite reach the level of stability seen in Genie 3. For instance, around the 10-second mark in the demo, a lake and building suddenly pop up on the left, replacing the mountain landscape that was there before.
Many scenes in the demo are visually and structurally reminiscent of Grand Theft Auto, raising questions about the legal use of copyrighted game worlds. | Video: He et al.
Compared to the existing open-source competitor Oasis, Matrix-Game 2.0 is expected to deliver better image quality, more consistent environments, and a more accurate response to user input.
Interactive video AI with real-time physics
Skywork highlights Matrix-Game 2.0's ability to generalize across a range of environments. The model can adapt to different visual styles and worlds without scene-specific tuning. According to Skywork, characters move in a physics-aware way, responding to objects and surroundings with plausible animations.
Potential use cases include game prototyping, AI agent training, and simulating virtual environments for autonomous driving research. The model could also be useful for projects in spatial intelligence or virtual humans, Skywork says.
Matrix-Game 2.0 is available for free on Hugging Face and GitHub. Skywork describes the release as "production-ready research" that can be integrated into development workflows. For local use, the company provides a complete inference pipeline with FlashAttention support and a streaming version. Installation uses standard packages, and inference is managed through YAML-configurable scripts.