Meta has introduced Movie Gen, a new AI model that generates videos, images and audio from text input. It can also edit existing videos.

At the core of Movie Gen is a 30-billion-parameter transformer model for video and image generation. It produces videos up to 16 seconds long at 16 frames per second, with support for different aspect ratios (1:1, 9:16, 16:9) at 768 × 768 pixel resolution. An additional upscaler can increase the resolution to Full HD (1080p).

Movie-Gen generates photorealistic videos with audio from text only. | Video: Meta AI

A separate 13-billion-parameter model handles audio generation. It can create sound, background music, and sound effects to match videos up to 45 seconds long at a 48 kHz sampling rate.

Example audio generation | video: Meta AI

Movie Gen also includes video editing capabilities that can modify existing videos using text instructions. Another feature allows users to create personalized videos by combining a photo of a person with a text description.

Example video editing | Video: Meta AI

Meta claims performance edge

Meta says Movie Gen outperforms similar models from companies like Runway, Sora, LumaLabs, Kling and Pika in human ratings. The gap appears smallest with Sora and Kling. Sora reportedly can produce consistent videos up to one minute long at a higher frame rate than Movie Gen claims.

Tabelle vergleicht Movie Gen Video mit LumaLabs, OpenAI Sora und Kling1.5 in verschiedenen Videogenerierungskategorien. Movie Gen führt in mehreren Bereichen.
Meta's Movie Gen outperforms competing AI video generators, especially in terms of realism and aesthetics. It is even slightly ahead of OpenAI's Sora examples shown so far. | Image: Meta AI

The company trained the models using licensed and publicly available datasets. The video generation model was pre-trained on about 100 million videos and one billion images. The audio model used approximately one million hours of audio data. More details can be found in the paper.

Movie Gen is currently for research purposes and not publicly available. Meta plans to work with filmmakers and creatives to incorporate feedback before a potential release.

The third generation of Meta's AI media models

Meta describes Movie Gen as the third generation of its AI media models, combining all previous modalities and allowing for more precise control. The company believes that the models could enable various new products.

However, Meta admits that the current models still have limitations. In particular, the inference time and the quality of the models could be improved by further scaling. Challenges remain with complex geometry, object manipulation, physics, and audio synchronization for dense or occluded motion.

Meta stresses that the technology is not meant to replace artists and animators, but to create new forms of expression. The company mentions animated "day in the life" videos for Instagram Reels or personalized birthday greetings for WhatsApp as possible applications.

