Meta, Google, OpenAI defend AI's transformative use of copyrighted data

Nov 4, 2023

DALL-E 3 prompted by THE DECODER

Key Points

Major AI companies such as Meta, Google, and OpenAI argue that training AI models with copyrighted material is fair use and does not violate the rights of copyright holders.
Meta compares generative AI to a printing press, a camera, or a computer, and emphasizes that high licensing fees for AI training data could slow the development of generative AI.
Google and OpenAI argue for a flexible interpretation of fair use and caution against premature legislation that could stifle innovation and limit the potential of AI technology.

The U.S. Copyright Office is gathering perspectives on how to handle copyright and intellectual property for generative AI. The major AI companies are also weighing in.

Meta, in particular, explains the background of AI development and the use of training data in a detailed letter dated late October 2023.

Meta argues that using copyrighted material to train generative AI models is not a consumptive use and therefore does not violate copyright law.

Even if the use triggers copyright protection, it is fair use, Meta argues, describing the training of AI models as a transformative process that extracts statistical information from language and abstract concepts to generate new content.

The company states that AI models do not store copyrighted data, but rather learn patterns and relationships from training data, and therefore do not violate the rights of copyright holders.

Generative AI is like the printing press, says Meta

Meta points out several problems with proposals for legal licensing mechanisms, and reiterates that copyright law does not and should not protect artistic style.

There are legitimate concerns about AI systems imitating artists' voices, looks, or styles, Meta acknowledges. However, this is not fundamentally new and is currently covered by state publicity laws, federal unfair competition laws, and First Amendment principles. Therefore, no drastic changes to current law are needed to regulate AI, Meta writes.

Moreover, Meta sees generative AI as a tool for enhancing human creativity and productivity, no different from a printing press, a camera, or a computer.

High licensing fees could slow generative AI

Interestingly, Meta also says that licensing AI training data on the scale needed would be so expensive that it could halt the progress of generative AI. "Indeed, it would be impossible for any market to develop that could enable AI developers to license all of the data their models need," the paper says.

Deals could be made with individual rights holders to license data. However, these agreements would only cover a "minuscule fraction" of the data needed.

For much content, such as online reviews, it would be administratively impossible to locate rights holders and negotiate licensing terms with them, Meta notes. OpenAI and Google make similar arguments.

Google urges restraint with new copyright rules

Google argues that existing copyright principles are flexible enough to deal with AI scenarios.

The company suggests that courts should decide how to apply these principles in specific cases. Google emphasizes that a balance must be struck between the interests of rights holders and the public.

Google believes that content generated by AI without human intervention is not copyrightable. However, most generative AI models would require human intervention and creativity. In such cases, copyright could be granted.

Regarding infringement, Google argues that a work is only infringing if it is "substantially similar" to the allegedly copied work. While this cannot be ruled out for generative AI, it is unlikely.

Google stresses that premature legislative action could do more harm than good. It would stifle innovation and limit the potential of AI technology.

Systems are still in the early stages of development and require a flexible interpretation of fair use so as not to limit new opportunities for creators, consumers, and society. Existing rules are sufficient to meet the challenges.

Google also points to newly introduced web controls that allow content publishers to determine whether training data crawlers can access and use content. However, this is only relevant for future AI models.

OpenAI claims fair use too

OpenAI also argues that generative AI does not reproduce copyrighted material, and that memorization and duplication of copyrighted material is extremely rare.

Like Google and Meta, OpenAI supports fair use because it sees training AI models as a transformative use of data.

OpenAI gives an example: When a model is presented with a large number of images labeled with the word "cup," it learns, much like a human child, which visual elements make up the concept of "cup," OpenAI writes.

This is done not by building an internal database of training images, but by abstracting the factual metadata associated with the term "cup."

In this way, the model can even combine concepts and create a new, completely original image of a "cup" or even a "cup of coffee" that is also a portal to another dimension," OpenAI writes.

Like Google, OpenAI refers to blocking the ChatGPT crawler so that your data doesn't end up in the training datasets if you don't want it to. However, similar to Google, this option will only affect future models, if any, as it has only been available for a short time. It has no effect on current models and existing datasets.

OpenAI urges the Copyright Office to be cautious in demanding new legal solutions, as the technology is rapidly evolving and the courts have not had a chance to rule on most of the issues raised by the Copyright Office.

Apple believes generative AI code is copyrightable

Apple filed the shortest response of the major AI companies, specifically addressing the use of generative AI for program code in the context of whether a human using a generative AI system should be considered the "author" of the material produced by the AI system.

Automating the development of computer programs is not a new development, and AI coding tools represent a significant evolution of that process, Apple notes.

When a human developer controls the tools, reviews the proposed code, and determines the form in which it is to be used, including conversions, the code that is ultimately produced has sufficient human authorship to be protected by copyright. This is true whether the tool is a generative AI tool or a conventional non-AI tool.

Companies like Apple that have their developers work with generative AI tools naturally have a strong interest in ensuring that the code of their software can be copyrighted and cannot be used by other companies.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.