Stable Video 3D transforms a single image into near-perfect 3D views

With Stability AI's latest model, a single frame is all you need to generate near-perfect views of an object from multiple angles.

Stability AI has introduced Stable Video 3D (SV3D), a new generative model based on stable video diffusion. According to the company's blog post, SV3D is designed to significantly improve the quality, consistency, and controllability of generating 3D content from single frames.

SV3D comes in two flavors: SV3D_u generates orbital videos based on single-frame input with no specified camera control. The extended version SV3D_p supports both single frames and 3D objects as input, allowing the generation of videos along specified camera paths. However, the resolution of the resulting videos is still relatively low at 576 x 576 pixels at 21 frames per second.

According to Stability AI, the use of video diffusion models, as opposed to image diffusion models such as those used in Stability AI's Zero123 released in December, offers great advantages in terms of generalization and view consistency of the generated output.

This is demonstrated in the following example, where SV3D generates detailed 3D views from a single photo compared to previous methods.

SV3D's processing pipeline is complex and includes the generation of Neural Radiance Fields (NeRF), which have led to major advances in 3D object generation in recent years. In addition, there is an illumination model that ensures the correct incidence of light depending on the viewing angle.

SV3D is now available for commercial use through the recently launched Stability AI membership. For non-commercial use, the model weights are available for download on Hugging Face.

SV3D appears to be a leap forward in creating consistent 3D views from single frames, which could benefit media creators in the fields of animation, game design, and VR.

Recommendation

AI research

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

The London-based AI startup has released several high-level visual models in recent months, including Stable Diffusion 3 for text-to-image, Stable 3D for text-to-3D, and Stable Video Diffusion for text-to-video.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Stable Video 3D transforms a single image into near-perfect 3D views

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

Stability AI releases a compact open text-to-audio model that runs on mobile devices

Stable Virtual Camera generates 3D videos from single images

Stability AI and Arm bring offline, on-device generative audio to mobile devices

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Stable Video 3D transforms a single image into near-perfect 3D views

Share

Bank details