OpenAI's Sora is the "GPT-1 of video" with plans to scale and unlock emergent AI capabilities

OpenAI's Sora AI model is capable of generating minute-long videos of impressive quality. In a presentation, the developers compare it to GPT-1, the precursor to modern language models.

OpenAI sees Sora as the foundation for better understanding and simulating the real world - a crucial step on the path to AGI. In a presentation at the AGI House, Sora developers Tim Brooks and Bill Peebles called the model "the GPT-1 of video" - a reference to the first modern Transformer language model GPT from 2018. The video was uploaded by YouTuber Wes Roth.

Like GPT-1, Sora is basic research, but with the potential to enable groundbreaking new applications. In the case of GPT, its successors have shown what's possible, from chatbots to code assistants to text summarization. OpenAI now expects something similar from Sora for video generation and analysis: "We think this technology will get a lot better very soon."

OpenAI expects to see emergent capabilities at scale

OpenAI sees Sora as a demonstration that generative AI models for video are scalable, and that emergent capabilities arise from further scaling. In the sample videos, Sora already demonstrates a basic understanding of physical interaction and the 3D geometry of real-world environments. People and animals move almost naturally through the generated worlds, objects are preserved despite camera pans, and surfaces cast realistic reflections.

The Sora team identifies simulation of complex physical processes, causality, and improved spatio-temporal logic as key areas for further progress. The developers believe that these capabilities can be achieved with larger models, much as generative language models have developed natural-looking coherence only through scaling.

In the long term, OpenAI hopes to better understand how people, animals, and objects interact in our world through multimodal modeling of all environments with Sora and similar models. This would be a critical step toward artificial general intelligence that can fully simulate and understand the real world. According to the team, there is enough data and methods to make better use of it to achieve AGI.

Meta's AI boss does not believe that Sora will succeed

Meta's chief of AI, Yann LeCun, on the other hand, does not see Sora as a suitable tool for predicting the world by generating pixels. He describes this approach as wasteful and doomed to failure. LeCun argues that generative models for sensory input will fail because it is too difficult to deal with the predictive uncertainty of high-dimensional continuous sensory input. He believes that generative AI works well for text because text is discrete and has a finite number of symbols, making it easier to deal with uncertainty.

At almost the same time as Sora, LeCun presented his own AI model called Video Joint Embedding Predictive Architecture (V-JEPA), which predicts and interprets complex interactions without relying on generative methods. V-JEPA focuses on prediction in a broader conceptual space and enables adaptation to different tasks by adding a small, task-specific layer rather than retraining the entire model.

Sora is currently available to a select group of Red Teamers for damage and risk assessment, as well as artists, designers, and filmmakers who want to provide feedback to improve its utility for creative professionals. Sora is scheduled for release later this year, but could be several months away as the timing may be affected by the US elections in November.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

OpenAI's Sora is the "GPT-1 of video" with plans to scale and unlock emergent AI capabilities

OpenAI expects to see emergent capabilities at scale

Meta's AI boss does not believe that Sora will succeed

MatterGen: Microsoft presents AI tools for generating and simulating new materials

OpenAI brings Sora video generator to UK and EU

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

OpenAI's Sora is the "GPT-1 of video" with plans to scale and unlock emergent AI capabilities

OpenAI expects to see emergent capabilities at scale

Meta's AI boss does not believe that Sora will succeed

MatterGen: Microsoft presents AI tools for generating and simulating new materials

OpenAI brings Sora video generator to UK and EU