Stable Virtual Camera generates 3D videos from single images

Stability AI has unveiled "Stable Virtual Camera," a new AI system that transforms regular photos into 3D videos without requiring complex 3D reconstructions or scene optimizations.

The system can create 360-degree videos lasting up to 30 seconds using just one photo or up to 32 input images. It supports 14 different camera movements, including 360-degree rotations, spirals, zoom effects, and more complex patterns like lemniscates (loop-shaped paths). When all cameras form a trajectory, Stability AI says the generated views are three-dimensional, temporally consistent, and - as the name suggests - "stable".

Handling multiple formats

The system works with various image formats including square (1:1), portrait (9:16), and landscape (16:9). This capability came as a surprise to the researchers since the model was only trained on 576x576 pixel square images. The team believes the model somehow learned to handle different image sizes on its own.

Stable Virtual Camera relies on a diffusion model with 1.3 billion parameters, building on the Stable Diffusion 2.1 architecture. To improve spatial understanding, the researchers transformed the model's 2D self-awareness into 3D self-awareness.

The system processes input images in two passes: First, it generates what the developers call "anchor images" from the input. Second, it creates the desired perspectives between these anchor points. According to the developers, this two-stage procedure helps ensure consistent and stable output.

Diagram: Architecture of the SEVA diffusion model, structure for training and sampling phase, components such as VAE, transcoder, attention modules. — The two-stage sampling procedure increases consistency over time and space. | Image: Stability AI

Benchmarks show Stable Virtual Camera performing better than existing solutions like ViewCrafter and CAT3D, particularly in handling large perspective shifts and creating fluid transitions.

The system still struggles to accurately render people, animals, and dynamic elements such as water surfaces. Visual artifacts can appear during complex camera movements or when processing ambiguous scenes, especially when the target perspective is significantly different from the original image.

Availability

The system is now available to researchers under a non-commercial license, with model weights freely available on Hugging Face and source code on GitHub. A public demo is also accessible through Hugging Face.

Since its early success with image generators, Stability AI has faced increasing competition from both open-source projects and commercial rivals, with Flux notably becoming a prominent alternative for open-source image generation.

Recommendation

AI research

The ARC benchmark's fall marks another casualty of relentless AI optimization

The company has recently reorganized to focus on two key areas: pushing forward research in 3D processing and novel view synthesis, while also developing optimized models for low-power devices like smartphones.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Stable Virtual Camera generates 3D videos from single images

Handling multiple formats

Availability

The ARC benchmark's fall marks another casualty of relentless AI optimization

UK judge rules that AI image generator Stable Diffusion is not an "infringing copy"

Universal Music Group rewrites its AI playbook with deals involving Udio and Stability AI

Stability AI releases a compact open text-to-audio model that runs on mobile devices

Corporate AI agents use simple workflows with human oversight instead of chasing full autonomy

Physicist Steve Hsu publishes research built around a core idea generated by GPT-5

The ARC benchmark's fall marks another casualty of relentless AI optimization

Stable Virtual Camera generates 3D videos from single images

Handling multiple formats

Availability

Share

Bank details