Content
summary Summary

Stability AI has introduced Stable Video 4D, a new diffusion model that generates dynamic 3D content from a single video input. The technology combines two previous innovations.

Ad

Stable Video 4D builds on Stability AI's Stable Video 3D, released in March, which initially generated objects from images in new perspectives and created static 3D objects from them. Stable Video 4D advances this technology by producing moving 3D video content (also known as 4D) from a single, flat video input.

Video: Stability AI

The application aims to be user-friendly: Users can input a video, specify desired 3D camera positions, and Stable Video 4D quickly produces eight new videos that follow the specified camera views, providing a comprehensive view of the subject from multiple angles.

Ad
Ad

According to Stability AI, Stable Video 4D takes about 40 seconds to generate videos of 5 frames each across 8 views at 576 x 576 pixel resolution, with an additional 20 to 25 minutes for 4D optimization. While still time-consuming, this is significantly faster than previous methods that took hours.

Video: Stability AI

One of the key features of Stable Video 4D is its ability to generate multiple new videos simultaneously, ensuring a consistent object appearance across multiple views and timestamps. The researchers achieve this by combining a video and a multi-view diffusion model, in this case Stable Video Diffusion and Stable Video 3D. They say that this approach should work with any attention-based diffusion model.

Image: Stability AI

The researchers believe that the lack of a comprehensive training dataset has hindered the development of a powerful 4D model. To address this, they compiled ObjaverseDy, which they derived from the existing Objaverse dataset and filtered for suitable content. The SV4D model weights were initialized with the pre-trained SVD and SV3D weights, leveraging the prior knowledge learned from large video and 3D datasets.

In benchmarks with multiple datasets, SV4D outperformed existing methods in both novel view video synthesis and 4D optimization. The generated results showed superior visual quality, uniformity, and consistency across different perspectives compared to the previous state of the art. The differences between the methods in the following demo are particularly noticeable in examples such as the hiker's backpack and the cyclist.

Recommendation

Video: Stability AI

The company is working on refining the model so that it can process a wider range of real videos in addition to the synthetic datasets it is currently being trained on. Stability AI sees potential applications for Stable Video 4D in game development, video editing, and virtual reality.

Stable Video 4D is now available on Hugging Face and, given its slight quality advantage over alternative methods, likely represents the new state of the art in this area. However, its handling and resolution are still a long way from everyday use by game companies or film productions.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Stability AI has introduced Stable Video 4D, a new AI model that can generate dynamic 3D content from multiple perspectives from a single video. It combines two of the company's previous technologies: Stable Video 3D and Stable Video Diffusion.
  • Users can input a video, specify the desired camera views, and get eight videos from different angles in about 40 seconds. The entire optimization process takes another 20 to 25 minutes. This is much faster than previous methods.
  • In benchmarks, Stable Video 4D demonstrated higher quality, uniformity, and consistency across different perspectives than the current state of the art. Stability AI sees potential applications in game development, video editing, and virtual reality. The model is now available on Hugging Face.
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.