OpenAI works on copyright solution for large AI models

May 10, 2023

OpenAI

According to Sam Altman, content producers who contribute to the capabilities of an AI model will benefit from it in the future. How exactly isn't clear yet.

At an AI summit at the White House, OpenAI CEO Sam Altman said his company is working on AI models that respect copyright. The goal, he said, is for content creators to be paid when their content, or in the case of images, their style, is used. Technical details are not yet known.

When OpenAI introduced the ChatGPT plugins, it showed an understanding of the potential impact of a large language model with tools on the content ecosystem. The more interaction that takes place in the chatbot ecosystem, the less attention - and therefore money - content creators will receive for their products outside the chatbot.

"We appreciate that this is a new method of interacting with the web, and welcome feedback on additional ways to drive traffic back to sources and add to the overall health of the ecosystem," OpenAI writes.

Possible options for text generation would be a Spotify-like streaming solution based on the tokens used if the generation can be uniquely attributed to sources, or a flat rate based on the amount of data one provides to OpenAI. Currently, websites can technically indicate whether they want to be crawled by ChatGPT or not, similar to the Google index.

AI models and copyright - it's complicated

The use of images and text to train large AI models without explicit consent is already controversial from a copyright perspective. In addition, generative AI models are capable of producing text or images that are very similar to the original. International lawsuits are pending. One of the larger ones is Getty Images vs. Stability AI (Stable Diffusion).

OpenAI and other AI companies could address this by only using data to train large AI models when it's clear it's allowed to do so. The question is whether it is commercially feasible to collect the necessary amount of data with permission.

While pre-trained language models are relatively static and currently only updated every few months or even years, ChatGPT, for example, can use a browser plugin to ingest information from the web in real-time and combine it with knowledge from the training data. This real-time capability of large language models with tools (plugins) takes the copyright debate to a new level.

Microsoft's Bing chatbot works similarly. Microsoft CEO Satya Nadella has promised publishers that the chatbot's outbound traffic will be defined as a success factor for the product and that publishers will share in its success.

It is still unclear how and if this will work. Even if chatbots can name sources, the number of website visits in the chatbot era is likely to drop dramatically as more and more web tasks are performed by language through a chatbot interface.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

More than 16% discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

OpenAI works on copyright solution for large AI models

AI models and copyright - it's complicated

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.