The unresolved AI copyright issue continues to simmer, drawing the next class action lawsuit. This time, Bloomberg and Eleuther AI are involved.
Mike Huckabee, former governor of Arkansas, and bestselling author Lysa TerKeurst are among the authors who have filed suit against Meta, Microsoft, and Bloomberg. They accuse the companies of using their work to train AI without their consent and illegally extracting "an enormous amount of value."
Books3 dataset alleged to contain pirated books
The new lawsuit centers on the "Books3" data set. The plaintiffs claim that it contains hundreds of thousands of illegally copied books. They were allegedly used by the named companies to train their large language models.
Microsoft and Meta have not yet commented on the new lawsuit. A Bloomberg spokesperson says that Books3 was not used to train the commercial version of BloombergGPT, only the research model.
Also named in the suit is EleutherAI, an AI research organization that included the Books3 dataset in its large AI training dataset, The Pile. The Books dataset, according to the complaint, contains approximately 183,000 books published over the past 20 years and represents 12 percent of the entire The Pile dataset.
"While using books as part of datasets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work," the plaintiffs claim.
In their lawsuit, the authors seek unspecified damages and an injunction to stop the misuse of their works. The authors' lawyer accuses the companies of developing large language models "by all means necessary—including theft of our authors' books."
One of many author lawsuits
Earlier, the Authors Guild announced that 17 prominent authors, including John Grisham, George R.R. Martin, and Jodi Picoult, have sued OpenAI for copyright infringement. A group of authors led by Pulitzer Prize winner Michael Chabon has filed suit against Meta and OpenAI in a federal court in San Francisco with nearly identical allegations.
The authors accuse OpenAI of using copyrighted books without permission to train AI, specifically as part of the Books dataset.
Even if these allegations are true, the question remains whether this use can be considered "fair use." The litigation could drag on for years, as leading AI companies continue to train large AI models and potentially learn from mistakes in selecting licensed training data for future models.
With the situation so unclear, Microsoft and Google have begun to offer a form of legal protection to customers of their generative AI products. Provided their systems are used as intended, the companies intend to cover the legal costs of copyright claims against customers.
Star author Stephen King is fine with his books being put into AI systems
Horror author Stephen King said he is not currently concerned about AI-generated writing, addressing worries about AI creativity and copyright infringement.
King believes that AI-generated texts are currently no better than the sum of their training materials and that machines have not been able to produce genuine creative moments because they lack the emotional capacity to do so. But that could change, King says.
Still, he would rather not remove his books from AI training datasets. That would be like a worker trying to stop industrial progress by destroying a mechanical loom, King writes.