AI and society

New lawsuit accuses Bloomberg, Microsoft, and Meta of training AI with pirated books

Matthias Bastian
Widescreen art piece that fuses retro and modern aesthetics. The background is blanketed with halftone dots, reminiscent of classic comic books. A robotic hand, constructed from green matrix-style binary code, holds a book labeled 'Books 3'. The book stands out with its cross-hatch shading technique. The scene is dominated by shades of green, with clear and sharp lines, bridging traditional print and digital art forms.

DALL-E 3 prompted by THE DECODER

The unresolved AI copyright issue continues to simmer, drawing the next class action lawsuit. This time, Bloomberg and Eleuther AI are involved.

Mike Huckabee, former governor of Arkansas, and bestselling author Lysa TerKeurst are among the authors who have filed suit against Meta, Microsoft, and Bloomberg. They accuse the companies of using their work to train AI without their consent and illegally extracting "an enormous amount of value."

Books3 dataset alleged to contain pirated books

The new lawsuit centers on the "Books3" data set. The plaintiffs claim that it contains hundreds of thousands of illegally copied books. They were allegedly used by the named companies to train their large language models.

Microsoft and Meta have not yet commented on the new lawsuit. A Bloomberg spokesperson says that Books3 was not used to train the commercial version of BloombergGPT, only the research model.

Also named in the suit is EleutherAI, an AI research organization that included the Books3 dataset in its large AI training dataset, The Pile. The Books dataset, according to the complaint, contains approximately 183,000 books published over the past 20 years and represents 12 percent of the entire The Pile dataset.

"While using books as part of datasets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work," the plaintiffs claim.

In their lawsuit, the authors seek unspecified damages and an injunction to stop the misuse of their works. The authors' lawyer accuses the companies of developing large language models "by all means necessary—including theft of our authors' books."

One of many author lawsuits

Earlier, the Authors Guild announced that 17 prominent authors, including John Grisham, George R.R. Martin, and Jodi Picoult, have sued OpenAI for copyright infringement. A group of authors led by Pulitzer Prize winner Michael Chabon has filed suit against Meta and OpenAI in a federal court in San Francisco with nearly identical allegations.

The authors accuse OpenAI of using copyrighted books without permission to train AI, specifically as part of the Books dataset.

Even if these allegations are true, the question remains whether this use can be considered "fair use." The litigation could drag on for years, as leading AI companies continue to train large AI models and potentially learn from mistakes in selecting licensed training data for future models.

With the situation so unclear, Microsoft and Google have begun to offer a form of legal protection to customers of their generative AI products. Provided their systems are used as intended, the companies intend to cover the legal costs of copyright claims against customers.

Star author Stephen King is fine with his books being put into AI systems

Horror author Stephen King said he is not currently concerned about AI-generated writing, addressing worries about AI creativity and copyright infringement.

King believes that AI-generated texts are currently no better than the sum of their training materials and that machines have not been able to produce genuine creative moments because they lack the emotional capacity to do so. But that could change, King says.

Still, he would rather not remove his books from AI training datasets. That would be like a worker trying to stop industrial progress by destroying a mechanical loom, King writes.

Sources: