A landmark settlement between Anthropic and US authors and publishers could set new ground rules for how AI companies use copyrighted books to train their models.
Anthropic has agreed to pay at least $1.5 billion to resolve a class action lawsuit accusing the company of "Napster-like" copyright infringement. According to a motion for preliminary approval filed in federal court in California on September 5, 2025, Anthropic was alleged to have used around 500,000 copyrighted books without permission.
The case focused on Anthropic's mass downloading and storage of books from pirate sites like Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi). Plaintiffs claimed the company trained its AI models on hundreds of thousands of works obtained illegally.
What's the price of AI training data?
Under the settlement, Anthropic will pay at least $1.5 billion into a non-refundable fund, spread over four payments across two years. With about 500,000 works involved, that averages out to $3,000 per book. If more titles are added, Anthropic will pay another $3,000 for each one.
The agreement only covers past infringements through August 25, 2025, and specifically leaves out any claims related to AI-generated outputs, both past and future. Books not included in the final "Works List" are also excluded. Anthropic must delete all files sourced from LibGen and PiLiMi, along with any copies, within 30 days after the settlement is finalized or after any court-ordered retention ends.
Compensation is based on the number of works, not the number of claimants. If both an author and a publisher claim the same book, a working group from the Authors Guild and the Association of American Publishers will advise on how to split the payment.
There’s still no standard price for licensing data to train AI models, but this settlement is starting to set a benchmark. Legal pressure is pushing the industry toward a real market for training data. For context, Microsoft reached a licensing deal with HarperCollins at $5,000 per book for AI training, while Anthropic’s settlement lands at around $3,000 per title.
For AI companies that have relied on scraping free or "priceless" content, this is a major shift. As courts and settlements start putting a dollar value on copyrighted works, licensing deals and business models across the industry are likely to change. These new costs could ripple through the entire AI sector, which is already facing rising expenses.
In Fair Use lawsuits, courts often consider whether the use of copyrighted material threatens a potential market for rights holders. The more established this new licensing market becomes, the harder it will be for AI companies to argue that their use is "fair" and without consequence.
Court: "Napster-like" piracy, no fair use for pirated books
Anthropic's decision to settle follows a clear signal from the federal court in San Francisco. In July 2025, the court allowed the class action to proceed and described Anthropic's actions as "Napster-style" copyright infringement. The court made it clear that fair use doesn't apply to pirated copies, even if the books are used in a transformative way for AI training. Acquiring the material illegally is, by itself, a violation.
A month earlier, in June 2025, the court ruled that training AI models with copyrighted books can sometimes count as fair use—but only if the books were obtained legally. That protection doesn't extend to material from pirate sources like LibGen or PiLiMi.
The ruling matters because it confirms that AI companies have to follow copyright law when collecting training data. But the line between legal and illegal sourcing isn't fully settled. While content from pirate sites is now clearly off-limits, a gray area remains around scraping publicly available web content for AI training without the consent of authors or site owners. It's still unclear how much of that data, if any, will need to be licensed in the future.