In an interview with Marques Brownlee, the Sora team explains that Sora will not be released "in the foreseeable future". The current version is still a research project.
"We don't even have any current timelines for when we would turn this into a product. We're in the feedback getting stage. We'll definitely be improving it, but how we should improve it is kind of an open question," says OpenAI research director Tim Brooks.
The model still faces challenges, especially in generating hands and complex physical processes and movements.
Early feedback is that users want more control over video generation than just the text prompt. "That's definitely one thing we'll be looking into," says Brooks. Adding audio is not directly on the development roadmap, but it is an option.
The team believes that in the future, it may be possible to generate videos that are indistinguishable from real videos. To curb fake AI videos, OpenAI plans to adapt the classifier introduced for OpenAI's image AI DALL-E 3 for Sora, says Sora team lead Aditya Ramesh. The image classifier can reliably identify whether an image was created with DALL-E 3.
How long it takes to create a video with Sora depends on several factors. But it can take a while, enough time to make a cup of coffee, the researchers say.
They also point to Sora's potentially revolutionary role in the creative industries: by lowering production costs, it could enable innovative content that was previously impossible due to financial barriers. The researchers see Sora as an example of how AI tools could enable entirely new forms of creative expression that go far beyond imitating existing media.
One of OpenAI's main goals with Sora is to help AI models better understand the world by learning from visual data combined with physics and time. OpenAI sees Sora as a first step in modeling reality.
A next step could be to develop models that build on Sora's visual understanding and have an understanding of the world, the researchers say. Meta's head of AI research, Yann LeCun, believes this approach is fundamentally flawed.
Sora was trained on a combination of publicly available data and data licensed from OpenAI. The model combines techniques from diffusion-based models and large language models (LLMs).