Ad
Skip to content

YouTube CEO's warning to OpenAI over Sora training data could backfire spectacularly

Image description
Midjourney prompted by THE DECODER

Key Points

  • Using YouTube videos to train OpenAI's Sora text-to-video model would violate the platform's terms of service, YouTube CEO Neal Mohan said in an interview with Bloomberg. He said he did not know whether OpenAI had used YouTube videos.
  • According to Mohan, creators have certain expectations when they upload content to YouTube. The terms of service do not allow transcripts or parts of videos to be used for AI training.
  • Mohan's statement could hurt Google in court, as the company is itself involved in legal disputes with artists and authors who claim their data was used by Google for AI training without permission. However, Google is arguing "fair use" because of the alleged transformative use for AI training.

YouTube CEO Neal Mohan said that using YouTube videos to train OpenAI's text-to-video generator Sora would violate the platform's rules, Bloomberg reports.

In an interview, Mohan said that creators have "certain expectations" when they upload "their hard work" to YouTube. One of those expectations is that "the terms of service is going to be abided by," which do not allow transcripts or parts of videos to be used elsewhere. Mohan said that would be a "clear violation" of YouTube's rules.

AI tools like Sora work because the companies that build them scrape various types of content from the web, both licensed and unlicensed. They use that data to train AI models that, through that training process, learn to create new content similar to the training data.

Mohan said he didn't know if OpenAI used YouTube videos to train Sora. OpenAI CTO Mira Murati wouldn't discuss Sora's training data in a recent interview.

Ad
DEC_D_Incontent-1

Mohan does a massive disservice to Google's AI strategy

Mohan's warning to OpenAI could spell trouble for Google, which is already fighting numerous lawsuits from artists and authors who claim Google has taken their data from the Internet without permission to train its AI models, including text, images, music, videos, and code.

But Google argues that scraping data for AI training is "fair use" because it's transformative, meaning the model only uses the data to learn, not to reproduce the data itself.

Mohan said Google has only used YouTube videos to train its own AI models in a way that follows YouTube's rules. But in other cases, Google has used data from other platforms and probably hundreds of thousands of creators. It hasn't been completely candid about it.

If used against Google in court, Mohan's comments could cause real problems for the company.

Ad
DEC_D_Incontent-2

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Bloomberg