StableVideo lets you edit video with Stable Diffusion

StableVideo brings some video editing capabilities to Stable Diffusion, such as allowing style transitions or changing backgrounds.

Generating realistic and temporally coherent videos from text prompts remains a challenge for AI systems, with even state-of-the-art systems such as those from RunwayML still showing significant inconsistencies.

While there is still much work to be done on this frontier, some research, such as StableVideo, is exploring how generative AI can build on existing videos. Instead of generating videos from scratch, StableVideo uses diffusion models such as Stable Diffusion to edit videos frame by frame.

According to the researchers, this ensures consistency across frames through a technique that transfers information between key frames. This allows objects and backgrounds to be modified semantically while maintaining continuity, similar to VideoControlNet.

StableVideo introduces inter-frame propagation for better consistency

It does this by introducing "inter-frame propagation" to diffusion models. This propagates object appearances between keyframes, allowing for consistent generation across the video sequence.

Specifically, StableVideo first selects key frames and uses a standard diffusion model such as Stable Diffusion to process them based on text prompts. The model takes visual structure into account to preserve shapes. It then transfers information from one edited keyframe to the next using their common overlap in the video. This guides the model to generate subsequent frames consistently.

Finally, an aggregation step combines the edited keyframes to create edited foreground and background video layers. Compositing these layers produces the final, coherent result.

Video: Chai et al.

Recommendation

AI research

Apple's local AI agent framework paves the way for more useful Apple Intelligence

In experiments, the team demonstrates StableVideo's ability to perform various text-based edits, such as changing object attributes or applying artistic styles, while maintaining strong visual continuity throughout.

However, there are limitations: performance still depends on the capabilities of the underlying diffusion model, according to the researchers. Consistency also fails for complex deforming objects, which requires further research.

More information and the code is available on the StableVideo GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

StableVideo lets you edit video with Stable Diffusion

StableVideo introduces inter-frame propagation for better consistency

Apple's local AI agent framework paves the way for more useful Apple Intelligence

MVDream creates impressive 3D renderings from text

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

StableVideo lets you edit video with Stable Diffusion

StableVideo introduces inter-frame propagation for better consistency

Apple's local AI agent framework paves the way for more useful Apple Intelligence

MVDream creates impressive 3D renderings from text