AI in practice

AI startup Runway integrates Stable Diffusion for text-to-video editor

Maximilian Schreiner

Runway

Startup Runway shows off an impressive AI video editor controlled by text input. The editor is using the open-source AI Stable Diffusion.

New York-based startup Runway offers an online editor that aims to make content creation accessible to all. To achieve this, the company relies on automation through artificial intelligence.

AI can currently, for example, replace backgrounds with a digital green screen, adjust the style, or remove objects from a video. The company is quickly incorporating new tools for AI-supported content creation into its platform.

Video: RunwayML

Runway has raised more than $45.5 million in funding from investors since its founding in 2018, including $35 million in December 2021 alone.  The increasing maturity of AI applications for images and videos may have been helpful in talks with investors.

Runway shows off new 'text-to-video' feature

The AI startup offers a whole range of different AI models for different features in its backend, including GAN models for generating images that can be used as backgrounds in a video, for example.

Recently, Runway started implementing Stable Diffusion, a powerful open-source model for images. Runway was already actively involved in the creation of Stable Diffusion: Staff member Patrick Esser was a researcher at the University of Heidelberg before joining Runway and was involved in the development of VQGAN and Latent Diffusion.

In a new promotional clip, Runway now shows an impressive text control of the video editor.

Using simple text commands, users can import video clips, generate images, change styles, cut characters, or activate tools.

Text-to-video feature is actually a text control

Runway markets the new feature as text-to-video, an apt description that can nonetheless be confusing: In technical jargon, AI systems like DALL-E 2, Midjourney, or Stable Diffusion are called text-to-image systems. They receive text input and generate a matching image.

However, Runway's text-to-video feature controls existing video tools via text input. It is not a generative video model that produces video from text input.

Such systems also exist: Deepmind recently showed its Transframer model, which creates 30-second video clips based on text. But they are still far removed in quality from the now quite mature image models.

The workflow of numerous creatives will nevertheless be greatly simplified by Runway's new feature. It also shows that the startup's bet on simple, AI-powered video production is working.