AI in practice

AI and copyright: book authors suffer defeat that doesn't mean much

Matthias Bastian

DALL-E 3 prompted by THE DECODER

A US District Judge in California has largely sided with OpenAI. She rejected most of the authors' copyright claims.

The authors, led by Sarah Silverman and Ta-Nehisi Coates, claim that the large language models that power ChatGPT were illegally trained on pirated versions of their books without their permission.

However, their basic claim seems questionable from the start: by allegedly training the AI on their works, every single ChatGPT output is an infringement of the rights of the works on which the AI was trained - even if the output text has nothing to do with the work.

The authors were unable to provide specific snippets and copies, which was noted by Judge Araceli Martínez-Olguín.

The authors were also unable to convince the judge that OpenAI had violated the Digital Millennium Copyright Act (DMCA) by allegedly removing copyright information (CMI) from the training data. They had no evidence to support this claim. OpenAI is keeping the training data used secret.

The authors have until March 13 to consolidate their cases and submit new arguments to pursue the dismissed claims.

However, the judge's decision in favor of OpenAI is at best a small and probably insignificant victory for OpenAI. This is because the claim for violation of California's Unfair Competition Law was allowed to proceed.

The court concluded that OpenAI's profitable use of copyrighted works, i.e., using book data to train AI without consent, could constitute an unfair business practice.

This assumption is also at the heart of the legal dispute between the New York Times and OpenAI. However, unlike Silverman et al, the Times has also faithfully reproduced articles from OpenAI's GPT models, precisely the evidence that Martínez-Olguín lacked in the Silverman case.

The court will have to decide whether the way in which the NYT made these copies was in compliance with OpenAI's terms and conditions, or whether a technical error was deliberately provoked.