OpenAI's stunning video generation debut Sora feels like a GPT-4 moment

OpenAI is showing off its first generative AI model for video called Sora, and from the looks of it, it's like a GPT-4 moment for video generation.

OpenAI announced Sora, the company's first text-to-video model, in a blog post and on X, formerly Twitter. Sora shows off an impressive array of capabilities, with the ability to create videos up to a minute long that boast unprecedented levels of visual fidelity and, most importantly, temporal stability, while - according to OpenAI - also adhering to user instructions. Examples such as a dog climbing between window sills show the impressive video stability of the model.

Video: OpenAI

The AI model is now available to a select group of red teamers for damage and risk assessment, as well as to visual artists, designers, and filmmakers who want to provide feedback to improve its utility for creative professionals.

OpenAI sees Sora as a foundation model on the path to AGI

Sora's current limitations are the challenge of accurately simulating complex physics or capturing specific cause-and-effect scenarios, according to OpenAI. For example, a character may bite into a cookie, but the visual aftermath - a bite mark - may be missing. Sora may also falter with spatial details, such as distinguishing left from right, and struggle with detailed descriptions of events over time, such as following a camera trajectory.

In terms of safety, OpenAI is implementing several strategies in advance of integrating Sora into its products. This includes working with red teamers and developing tools such as a detection classifier to identify when a video is Sora-generated. They aim to include C2PA metadata in the future, assuming the model is used in an OpenAI product. Building on the security methods established for DALL-E 3, OpenAI plans to use text classifiers to check for prompts that violate content policies and image classifiers to check video frames to ensure compliance with usage policies.

Video: OpenAI

Sora is a diffusion model that works by progressively transforming static, noisy videos into clear images. By representing videos as collections of data patches, similar to GPT's tokens, the model can work with a wider range of visual data than previously possible, the company says. Leveraging recaption techniques from DALL-E 3, Sora can more faithfully execute text instructions within generated videos. Temporal stability for Sora's generation is made possible by "allowing the model to look ahead many frames at a time."

OpenAI sees Sora as a foundational model "that can understand and simulate the real world", a critical step toward achieving Artificial General Intelligence (AGI).

Recommendation

AI research

Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning

More examples are available on the Sora website.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

OpenAI's stunning video generation debut Sora feels like a GPT-4 moment

OpenAI sees Sora as a foundation model on the path to AGI

Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning

OpenAI rolls out "High Input Fidelity" for more precise image editing

OpenAI disables Sora video generation as user surge overwhelms servers

OpenAI brings Sora video generator to UK and EU

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

OpenAI's stunning video generation debut Sora feels like a GPT-4 moment

OpenAI sees Sora as a foundation model on the path to AGI

Share

Bank details