Anthropic won a fair use hearing that could end up being a defeat

Jun 24, 2025

Midjourney prompted by THE DECODER

A new court ruling draws a sharp line between fair use and infringement for AI companies training on copyrighted books, allowing transformative use of legally obtained works but rejecting any defense for pirated material.

A recent court decision allows AI companies to use copyrighted books for training if the works are obtained legally, calling the practice "transformative - spectacularly so" because the aim is to learn from, not copy, the original texts.

"Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use."

Bartz v. Anthropic PBC, p. 13-14

This reasoning lines up with the broader transformative use argument put forward by many AI companies as they defend data scraping practices that happen without creator consent.

The finished models don't directly reproduce the books, either. The court noted that the plaintiffs - authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson - didn't even try to show that Claude could generate outputs resembling or replacing the originals.

This part of the ruling covered books Anthropic legally bought in print, often secondhand. The company removed the bindings, scanned the books, and then destroyed the originals. The resulting PDFs were stored in a searchable internal library. Since Anthropic didn't make or distribute extra copies, the court said this also qualified as fair use.

No pass for piracy

The court took a much stricter line on books Anthropic obtained from pirate sources like Books3, LibGen, and PiLiMi. Between January 2021 and July 2022, Anthropic downloaded more than 7 million books from illegal sources, including works by the plaintiffs. These files were stored permanently, even if they weren't used for training. Meta and other AI companies are believed to have used similar data sources.

The court made clear that building a digital library of pirated books isn't transformative use and doesn't qualify as fair use. The idea that a company might develop a lawful use later doesn't excuse the initial infringement. "There is no carveout, however, from the Copyright Act for AI companies," the court wrote.

In short, using copyrighted works for AI training can be fair use if the data was obtained legally. But companies that knowingly use pirated copies can't rely on fair use as a defense.

One big question remains: Is mass scraping of online content - especially when technical barriers are bypassed - a lawful way to obtain data?

Many AI models are trained on data scraped from public websites without the creators' consent, and clear legal standards are still missing. If this ruling leads to a requirement for mass licensing of copyrighted data, it could pose major challenges for AI companies, even if the actual use of the data is considered transformative.

While the court sided with Anthropic on digitizing purchased books and using them for training, it refused to dismiss the case entirely. The claims related to pirated books and the permanent storage of unused works are still on the table. The proceedings will continue, focusing on whether Anthropic can be held liable for using pirated content. The court will also consider possible damages for "willful infringement."

The case is still in its early stages and will move forward in the federal court for the Northern District of California. The outcome could influence other lawsuits over AI training and copyrighted data.

In a separate case involving Meta, another US judge has already raised serious doubts about whether copyrighted data can be used for AI training at all. The US Copyright Office also stated that fair use does not extend to AI models trained on large amounts of copyrighted material, but its director was removed by the Trump administration soon after that report was published, so that position may no longer reflect current policy.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Anthropic won a fair use hearing that could end up being a defeat

No pass for piracy

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.