YouTube CEO Neal Mohan said that using YouTube videos to train OpenAI's text-to-video generator Sora would violate the platform's rules, Bloomberg reports.
In an interview, Mohan said that creators have "certain expectations" when they upload "their hard work" to YouTube. One of those expectations is that "the terms of service is going to be abided by," which do not allow transcripts or parts of videos to be used elsewhere. Mohan said that would be a "clear violation" of YouTube's rules.
AI tools like Sora work because the companies that build them scrape various types of content from the web, both licensed and unlicensed. They use that data to train AI models that, through that training process, learn to create new content similar to the training data.
Mohan said he didn't know if OpenAI used YouTube videos to train Sora. OpenAI CTO Mira Murati wouldn't discuss Sora's training data in a recent interview.
Mohan does a massive disservice to Google's AI strategy
Mohan's warning to OpenAI could spell trouble for Google, which is already fighting numerous lawsuits from artists and authors who claim Google has taken their data from the Internet without permission to train its AI models, including text, images, music, videos, and code.
But Google argues that scraping data for AI training is "fair use" because it's transformative, meaning the model only uses the data to learn, not to reproduce the data itself.
Mohan said Google has only used YouTube videos to train its own AI models in a way that follows YouTube's rules. But in other cases, Google has used data from other platforms and probably hundreds of thousands of creators. It hasn't been completely candid about it.
If used against Google in court, Mohan's comments could cause real problems for the company.