New lawsuit accuses Bloomberg, Microsoft, and Meta of training AI with pirated books

Oct 19, 2023

DALL-E 3 prompted by THE DECODER

The unresolved AI copyright issue continues to simmer, drawing the next class action lawsuit. This time, Bloomberg and Eleuther AI are involved.

Mike Huckabee, former governor of Arkansas, and bestselling author Lysa TerKeurst are among the authors who have filed suit against Meta, Microsoft, and Bloomberg. They accuse the companies of using their work to train AI without their consent and illegally extracting "an enormous amount of value."

Books3 dataset alleged to contain pirated books

The new lawsuit centers on the "Books3" data set. The plaintiffs claim that it contains hundreds of thousands of illegally copied books. They were allegedly used by the named companies to train their large language models.

Microsoft and Meta have not yet commented on the new lawsuit. A Bloomberg spokesperson says that Books3 was not used to train the commercial version of BloombergGPT, only the research model.

Also named in the suit is EleutherAI, an AI research organization that included the Books3 dataset in its large AI training dataset, The Pile. The Books dataset, according to the complaint, contains approximately 183,000 books published over the past 20 years and represents 12 percent of the entire The Pile dataset.

"While using books as part of datasets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work," the plaintiffs claim.

In their lawsuit, the authors seek unspecified damages and an injunction to stop the misuse of their works. The authors' lawyer accuses the companies of developing large language models "by all means necessary—including theft of our authors' books."

One of many author lawsuits

Earlier, the Authors Guild announced that 17 prominent authors, including John Grisham, George R.R. Martin, and Jodi Picoult, have sued OpenAI for copyright infringement. A group of authors led by Pulitzer Prize winner Michael Chabon has filed suit against Meta and OpenAI in a federal court in San Francisco with nearly identical allegations.

The authors accuse OpenAI of using copyrighted books without permission to train AI, specifically as part of the Books dataset.

Even if these allegations are true, the question remains whether this use can be considered "fair use." The litigation could drag on for years, as leading AI companies continue to train large AI models and potentially learn from mistakes in selecting licensed training data for future models.

With the situation so unclear, Microsoft and Google have begun to offer a form of legal protection to customers of their generative AI products. Provided their systems are used as intended, the companies intend to cover the legal costs of copyright claims against customers.

Star author Stephen King is fine with his books being put into AI systems

Horror author Stephen King said he is not currently concerned about AI-generated writing, addressing worries about AI creativity and copyright infringement.

King believes that AI-generated texts are currently no better than the sum of their training materials and that machines have not been able to produce genuine creative moments because they lack the emotional capacity to do so. But that could change, King says.

Still, he would rather not remove his books from AI training datasets. That would be like a worker trying to stop industrial progress by destroying a mechanical loom, King writes.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

More than 16% discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder