Content
summary Summary

The unresolved AI copyright issue continues to simmer, drawing the next class action lawsuit. This time, Bloomberg and Eleuther AI are involved.

Mike Huckabee, former governor of Arkansas, and bestselling author Lysa TerKeurst are among the authors who have filed suit against Meta, Microsoft, and Bloomberg. They accuse the companies of using their work to train AI without their consent and illegally extracting "an enormous amount of value."

Books3 dataset alleged to contain pirated books

The new lawsuit centers on the "Books3" data set. The plaintiffs claim that it contains hundreds of thousands of illegally copied books. They were allegedly used by the named companies to train their large language models.

Microsoft and Meta have not yet commented on the new lawsuit. A Bloomberg spokesperson says that Books3 was not used to train the commercial version of BloombergGPT, only the research model.

Ad
Ad

Also named in the suit is EleutherAI, an AI research organization that included the Books3 dataset in its large AI training dataset, The Pile. The Books dataset, according to the complaint, contains approximately 183,000 books published over the past 20 years and represents 12 percent of the entire The Pile dataset.

"While using books as part of datasets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work," the plaintiffs claim.

In their lawsuit, the authors seek unspecified damages and an injunction to stop the misuse of their works. The authors' lawyer accuses the companies of developing large language models "by all means necessary—including theft of our authors' books."

One of many author lawsuits

Earlier, the Authors Guild announced that 17 prominent authors, including John Grisham, George R.R. Martin, and Jodi Picoult, have sued OpenAI for copyright infringement. A group of authors led by Pulitzer Prize winner Michael Chabon has filed suit against Meta and OpenAI in a federal court in San Francisco with nearly identical allegations.

The authors accuse OpenAI of using copyrighted books without permission to train AI, specifically as part of the Books dataset.

Recommendation

Even if these allegations are true, the question remains whether this use can be considered "fair use." The litigation could drag on for years, as leading AI companies continue to train large AI models and potentially learn from mistakes in selecting licensed training data for future models.

With the situation so unclear, Microsoft and Google have begun to offer a form of legal protection to customers of their generative AI products. Provided their systems are used as intended, the companies intend to cover the legal costs of copyright claims against customers.

Star author Stephen King is fine with his books being put into AI systems

Horror author Stephen King said he is not currently concerned about AI-generated writing, addressing worries about AI creativity and copyright infringement.

King believes that AI-generated texts are currently no better than the sum of their training materials and that machines have not been able to produce genuine creative moments because they lack the emotional capacity to do so. But that could change, King says.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Still, he would rather not remove his books from AI training datasets. That would be like a worker trying to stop industrial progress by destroying a mechanical loom, King writes.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Yet another group of authors, including Mike Huckabee and Lysa TerKeurst, have sued Meta, Microsoft, Bloomberg, and EleutherAI. They accuse the companies of using their copyrighted works for artificial intelligence training without their consent.
  • The lawsuit centers on the "Books3" dataset, which allegedly contains hundreds of thousands of illegally copied books. EleutherAI incorporated this dataset into the training dataset "The Pile".
  • The authors are seeking damages and an injunction against the misuse of their works. This is one of several lawsuits filed by authors against AI companies, including one against OpenAI for copyright infringement.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.