Content
summary Summary

Stability AI has unveiled "Stable Virtual Camera," a new AI system that transforms regular photos into 3D videos without requiring complex 3D reconstructions or scene optimizations.

Ad

The system can create 360-degree videos lasting up to 30 seconds using just one photo or up to 32 input images. It supports 14 different camera movements, including 360-degree rotations, spirals, zoom effects, and more complex patterns like lemniscates (loop-shaped paths). When all cameras form a trajectory, Stability AI says the generated views are three-dimensional, temporally consistent, and - as the name suggests - "stable".

Handling multiple formats

The system works with various image formats including square (1:1), portrait (9:16), and landscape (16:9). This capability came as a surprise to the researchers since the model was only trained on 576x576 pixel square images. The team believes the model somehow learned to handle different image sizes on its own.

Stable Virtual Camera relies on a diffusion model with 1.3 billion parameters, building on the Stable Diffusion 2.1 architecture. To improve spatial understanding, the researchers transformed the model's 2D self-awareness into 3D self-awareness.

Ad
Ad

The system processes input images in two passes: First, it generates what the developers call "anchor images" from the input. Second, it creates the desired perspectives between these anchor points. According to the developers, this two-stage procedure helps ensure consistent and stable output.

Diagram: Architecture of the SEVA diffusion model, structure for training and sampling phase, components such as VAE, transcoder, attention modules.
The two-stage sampling procedure increases consistency over time and space. | Image: Stability AI

Benchmarks show Stable Virtual Camera performing better than existing solutions like ViewCrafter and CAT3D, particularly in handling large perspective shifts and creating fluid transitions.

The system still struggles to accurately render people, animals, and dynamic elements such as water surfaces. Visual artifacts can appear during complex camera movements or when processing ambiguous scenes, especially when the target perspective is significantly different from the original image.

Availability

The system is now available to researchers under a non-commercial license, with model weights freely available on Hugging Face and source code on GitHub. A public demo is also accessible through Hugging Face.

Since its early success with image generators, Stability AI has faced increasing competition from both open-source projects and commercial rivals, with Flux notably becoming a prominent alternative for open-source image generation.

Recommendation

The company has recently reorganized to focus on two key areas: pushing forward research in 3D processing and novel view synthesis, while also developing optimized models for low-power devices like smartphones.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Stability AI presents an AI system called "Stable Virtual Camera" that creates spatial 3D videos from a few 2D images without having to perform complex calculations for complex 3D scenes.
  • The technology uses a two-step process that first generates selected key frames and then creates perspectives between them, ensuring visual stability and smooth transitions even with more complex camera movements such as zooms, spirals, or 360-degree movements.
  • Although the system already outperforms existing competitive products, it still struggles with challenging subjects such as people, animals or moving textures. The AI model is available to researchers for free, non-commercial use.
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.