On Tuesday, Stability AI released Stable Video Diffusion. The first implementation for private users is now available.
The makers of the Stable Diffusion tool "ComfyUI" have added support for Stable AI's Stable Video Diffusion models in a new update. ComfyUI is a graphical user interface for Stable Diffusion, using a graph/node interface that allows users to build complex workflows. It is an alternative to other interfaces such as AUTOMATIC1111.
According to the developers, the update can be used to create videos at 1024 x 576 resolution with a length of 25 frames on the 7-year-old Nvidia GTX 1080 with 8 gigabytes of VRAM. AMD users can also use the generative video AI with ComfyUI on an AMD 6800 XT running ROCm on Linux. It takes about 3 minutes to create a video.
The developers have published two sample workflows for Stable Video Diffusion in ComfyUI - one for the 14-frame model and one for the 25-frame model - on their blog.
Stability AI plans further improvements for Stable Video Diffusion
Earlier this week, Stability AI released the research preview of Stable Video Diffusion, a generative video model designed to outperform commercial competitors RunwayML and Pika Labs in user preference studies.
The model has been released in two frame-to-video formats that can generate 14 or 25 frames at adjustable frame rates between 3 and 30 frames per second and is based on the Stable Diffusion image model.
Initially available as a research version on Github, Stability AI plans to develop an ecosystem of models based on it. Like Stable Diffusion, the final model will be freely available and a web version with text-to-video functionality is also planned.