Content
summary Summary

A US federal court in California has ruled in Meta's favor in a high-profile lawsuit over its use of copyrighted books to train Llama language models, but the decision falls far short of granting AI companies carte blanche to use protected works.

Ad

Thirteen authors, including Pulitzer Prize winners Junot Díaz and Andrew Sean Greer, sued Meta for allegedly using their books without permission to train its Llama models. The court dismissed their claims, citing a lack of evidence, but made clear the decision applies only to these plaintiffs and doesn't set a broad precedent for the industry.

The key issue was whether training large language models on copyrighted works qualifies as "fair use" under US copyright law. The court concluded Meta's use was "highly transformative," since the models generate new text based on user prompts rather than simply republishing the books. The fact that Meta could see up to $1.4 trillion in revenue from Llama over the next ten years weighed against fair use, but ultimately didn't tip the scales.

Weak evidence on market harm

The real sticking point was whether Meta's training practices harmed the market for the original works. The authors argued that unlicensed AI training undermined the value of licensing and could flood the market with AI-generated imitators. The court rejected both arguments, saying that the mere existence of a licensing market isn't evidence of infringement, and that the plaintiffs didn't provide real proof that Llama was hurting sales.

Ad
Ad

Meta had attempted to secure licenses from publishers but ran into legal and organizational hurdles. The company then turned to shadow libraries like LibGen and Anna's Archive for source material. Meta implemented safeguards to prevent its models from reproducing long sections of text verbatim; even with targeted prompting, tests showed the model would reproduce no more than about 50 words from the books.

While the court acknowledged that scraping books from pirate sites is problematic, it ruled that this alone doesn't rule out fair use. What matters is how the data is used, not just its origin.

In a parallel case against Anthropic, Judge William Alsup took a stricter stance, arguing that training models on books from pirate sites like Books3 or LibGen does not qualify as fair use. He wrote that an intention to build a legal product doesn't justify breaking the law to do it.

No free pass for AI companies

The court made it clear that the ruling does not amount to broad approval for using copyrighted books in AI training. Judge Vince Chhabria noted the plaintiffs "made the wrong arguments and failed to develop a record in support of the right one." He left the door open for future lawsuits from authors who can provide stronger evidence of market harm.

The idea that AI models could churn out endless books in certain genres, potentially undercutting sales by human authors, remains a real concern that could take center stage in future cases. The court also clarified that banning AI training on copyrighted material isn't required by law—companies may just need to pay for licenses.

Recommendation

"Where copying for LLM training isn't fair use, LLM developers (including Meta) won't need to stop using copyrighted works to train their models. They will need only to pay rightsholders for licenses for that training."

Judge Vince Chhabria

Chhabria used the opportunity to take aim at both the tech industry and other judges, dismissing claims that stronger copyright rules would slow down AI development as "ridiculous."

"If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it."

Judge Vince Chhabria

Chhabria also took particular issue with his colleague Judge William Alsup, who in a parallel case against Anthropic compared AI training to schoolchildren reading books. Chhabria called this an "inapt analogy," pointing out that an AI model can generate countless competing works with minimal time and creativity.

He concluded by leaving future cases wide open: while AI training may be transformative, "it's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books."

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A US federal court in California has partially sided with Meta in a lawsuit over the use of copyrighted books to train its Llama AI models, dismissing several claims from thirteen authors due to insufficient evidence of harm to the books' market value.
  • The court noted that while Meta’s use of the books was "highly transformative," the data's commercial use and origin from shadow libraries did not automatically qualify as fair use—what mattered most was whether the original works' market value was affected.
  • Judge Vince Chhabria rejected arguments that stricter copyright enforcement would necessarily slow AI progress. "If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it."
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.