Content
summary Summary

A California federal court has cleared the way for a billion-dollar class action lawsuit against Anthropic, the company behind the Claude language model, over claims of large-scale copyright infringement.

Ad

The suit alleges that Anthropic downloaded as many as seven million books from pirate sites like LibGen and PiLiMi between 2021 and 2022. This puts the company in the crosshairs for potentially massive damages, even after a partial win on fair use grounds just weeks earlier.

A "Napster-style" piracy case

According to the court order from July 17, 2025, Anthropic is accused of using the BitTorrent protocol to download pirated books from LibGen and PiLiMi. These files - typically in .epub, .pdf, or .txt format - were stored in a central internal database, regardless of whether they were later used to train AI models.

Judge William Alsup described the company's actions as "Napster-style downloading of millions of works." The order details how, between January 2021 and July 2022, an Anthropic co-founder first downloaded about 200,000 books from the Books3 collection, followed by roughly five million from LibGen and another two million from PiLiMi, targeting titles not already in LibGen.

Ad
Ad

The court decided the case should move forward as a class action, given the sheer volume and complexity of the evidence. Only works sourced from LibGen and PiLiMi are included; Books3 was left out due to missing metadata.

The financial risk for Anthropic is significant. Under US law, damages for willful copyright infringement can reach up to $150,000 per work. Even a much smaller amount per title could still total billions.

Anthropic must turn over a complete metadata list of its LibGen and PiLiMi downloads by August 1, 2025, while plaintiffs are required to submit a detailed list of titles and registrations by September 1, 2025.

Fair use doesn't apply to piracy

In June, the same court ruled that training AI models on legally obtained books may qualify as fair use, especially if the use is "transformative" and no copies are distributed. But the court also made it clear: storing pirated works in an internal library doesn't qualify as fair use.

While the legal status of mass web scraping and the use of public data for AI training is still up in the air, the court’s ruling sets a clear boundary: pirated content can't be justified as fair use, even for AI research or innovation.

Recommendation

The Anthropic case could set a major precedent for the industry, making it clear that AI companies can't sidestep copyright laws when sourcing training data, regardless of how they use it later. The decision could ripple out to ongoing lawsuits against Meta, OpenAI, and others accused of using copyrighted material to train language models.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A federal court in California has allowed a class action lawsuit against Anthropic to proceed, alleging the company downloaded and permanently stored up to seven million books from pirate sites like LibGen and PiLiMi.
  • The court stated that using and storing pirated works by AI companies is not covered by fair use, even if the material was later used for training purposes.
  • This case could significantly impact the AI industry, as it confirms that copyright law applies to language model training and that violations may lead to substantial financial penalties.
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.