AI research

OpenAI CTO Mira Murati doesn't know what data Sora was trained on

Matthias Bastian

Sora prompted by OpenAI

Mira Murati, CTO of OpenAI, says in an interview with the Wall Street Journal that she doesn't know exactly what data Sora's latest video model was trained on. This is a problem because it shows a lack of acknowledgement of the problem.

When asked what training data was used for Sora, Murati repeats the wording from OpenAI's announcement: The model is trained on public and licensed data. Asked by WSJ reporter Joanna Stern whether she was talking about YouTube or Facebook videos, for example, Murati said she wasn't sure.

Of course, as CTO, Murati is not necessarily involved in day-to-day development. But with OpenAI being sued left and right for alleged data theft, saying "I'm not sure" in a prepared interview doesn't seem very convincing.

To her credit, Sora is still in development and won't be released anytime soon. After the interview, Murati confirmed that some of the licensed data is training material from Shutterstock.

OpenAI is facing several lawsuits, including from authors and the New York Times, who claim that their copyrighted works have been used to train AI models without permission.

OpenAI argues that the use of copyrighted data for AI training is covered by fair use, and that it is impossible to train state-of-the-art AI models without copyrighted material.

Sora is "much, much more expensive" than current generative AI systems

Murati also commented on the cost of Sora, saying that video generation is currently still "much, much more expensive" than existing systems. Once Sora is released, Murati expects the cost to be similar to that of DALL-E 3. Sora's release is "definitely planned for this year," but could take a few more months, Murati said.

The US elections in November may affect the release date. Sora's safety guidelines are still under development, but Murati expects them to be similar to those of DALL-E 3, which prohibit the creation of images of publicly known people.