AI in practice

Meta, Google, OpenAI defend AI's transformative use of copyrighted data

Matthias Bastian
Improve the widescreen illustration titled 'theft' with clear visual storytelling. The humanoid robot should be prominently engaged in the act of pickpocketing, with its hand clearly reaching into the back pocket of a human photographer and extracting an image. The photographer, dressed in casual attire, is oblivious and engrossed in taking a photo. Eliminate any colored borders to utilize the full image frame. Intensify the computer glitch aesthetic for a striking effect, merging hand-drawn elements with pronounced digital disruptions to highlight the theme of technology intrusion.

DALL-E 3 prompted by THE DECODER

The U.S. Copyright Office is gathering perspectives on how to handle copyright and intellectual property for generative AI. The major AI companies are also weighing in.

Meta, in particular, explains the background of AI development and the use of training data in a detailed letter dated late October 2023.

Meta argues that using copyrighted material to train generative AI models is not a consumptive use and therefore does not violate copyright law.

Even if the use triggers copyright protection, it is fair use, Meta argues, describing the training of AI models as a transformative process that extracts statistical information from language and abstract concepts to generate new content.

The company states that AI models do not store copyrighted data, but rather learn patterns and relationships from training data, and therefore do not violate the rights of copyright holders.

Generative AI is like the printing press, says Meta

Meta points out several problems with proposals for legal licensing mechanisms, and reiterates that copyright law does not and should not protect artistic style.

There are legitimate concerns about AI systems imitating artists' voices, looks, or styles, Meta acknowledges. However, this is not fundamentally new and is currently covered by state publicity laws, federal unfair competition laws, and First Amendment principles. Therefore, no drastic changes to current law are needed to regulate AI, Meta writes.

Moreover, Meta sees generative AI as a tool for enhancing human creativity and productivity, no different from a printing press, a camera, or a computer.

High licensing fees could slow generative AI

Interestingly, Meta also says that licensing AI training data on the scale needed would be so expensive that it could halt the progress of generative AI. "Indeed, it would be impossible for any market to develop that could enable AI developers to license all of the data their models need," the paper says.

Deals could be made with individual rights holders to license data. However, these agreements would only cover a "minuscule fraction" of the data needed.

For much content, such as online reviews, it would be administratively impossible to locate rights holders and negotiate licensing terms with them, Meta notes. OpenAI and Google make similar arguments.

Google urges restraint with new copyright rules

Google argues that existing copyright principles are flexible enough to deal with AI scenarios.

The company suggests that courts should decide how to apply these principles in specific cases. Google emphasizes that a balance must be struck between the interests of rights holders and the public.

Google believes that content generated by AI without human intervention is not copyrightable. However, most generative AI models would require human intervention and creativity. In such cases, copyright could be granted.

Regarding infringement, Google argues that a work is only infringing if it is "substantially similar" to the allegedly copied work. While this cannot be ruled out for generative AI, it is unlikely.

Google stresses that premature legislative action could do more harm than good. It would stifle innovation and limit the potential of AI technology.

Systems are still in the early stages of development and require a flexible interpretation of fair use so as not to limit new opportunities for creators, consumers, and society. Existing rules are sufficient to meet the challenges.

Google also points to newly introduced web controls that allow content publishers to determine whether training data crawlers can access and use content. However, this is only relevant for future AI models.

OpenAI claims fair use too

OpenAI also argues that generative AI does not reproduce copyrighted material, and that memorization and duplication of copyrighted material is extremely rare.

Like Google and Meta, OpenAI supports fair use because it sees training AI models as a transformative use of data.

OpenAI gives an example: When a model is presented with a large number of images labeled with the word "cup," it learns, much like a human child, which visual elements make up the concept of "cup," OpenAI writes.

This is done not by building an internal database of training images, but by abstracting the factual metadata associated with the term "cup."

In this way, the model can even combine concepts and create a new, completely original image of a "cup" or even a "cup of coffee" that is also a portal to another dimension," OpenAI writes.

Like Google, OpenAI refers to blocking the ChatGPT crawler so that your data doesn't end up in the training datasets if you don't want it to. However, similar to Google, this option will only affect future models, if any, as it has only been available for a short time. It has no effect on current models and existing datasets.

OpenAI urges the Copyright Office to be cautious in demanding new legal solutions, as the technology is rapidly evolving and the courts have not had a chance to rule on most of the issues raised by the Copyright Office.

Apple believes generative AI code is copyrightable

Apple filed the shortest response of the major AI companies, specifically addressing the use of generative AI for program code in the context of whether a human using a generative AI system should be considered the "author" of the material produced by the AI system.

Automating the development of computer programs is not a new development, and AI coding tools represent a significant evolution of that process, Apple notes.

When a human developer controls the tools, reviews the proposed code, and determines the form in which it is to be used, including conversions, the code that is ultimately produced has sufficient human authorship to be protected by copyright. This is true whether the tool is a generative AI tool or a conventional non-AI tool.

Companies like Apple that have their developers work with generative AI tools naturally have a strong interest in ensuring that the code of their software can be copyrighted and cannot be used by other companies.