Ad
Skip to content

Encyclopedia Britannica sues OpenAI for training on nearly 100,000 articles without permission

Image description
Nano Banana Pro prompted by THE DECODER

Encyclopedia Britannica and its subsidiary Merriam-Webster have filed a lawsuit against OpenAI in federal court in Manhattan.

The lawsuit alleges that OpenAI used nearly 100,000 online articles, encyclopedia entries, and dictionary definitions without permission to train its AI models, Reuters first reported. According to the complaint, ChatGPT produces near-verbatim copies of Britannica content in some cases, pulling users away from Britannica's own websites.

Britannica is also accusing OpenAI of trademark infringement, claiming ChatGPT creates the false impression that Britannica has endorsed its use and cites Britannica as a source in inaccurate AI responses. The company is seeking damages and an injunction.

The complaint states that GPT-4 has "memorized" much of Britannica's copyrighted content and can reproduce near-verbatim copies of entire sections on demand.

Ad
DEC_D_Incontent-1

GPT-4 itself has "memorized" much of Britannica's copyrighted content and will output near-verbatim copies of significant portions on demand. The memorized examples are unauthorized copies that Defendants used to train their models, including GPT-4.

Excerpt from the complaint

Courts are split on whether AI models actually "store" copyrighted works

Whether AI models store copyrighted works in their parameters—and whether that counts as copying—is a question courts are answering in very different ways right now. In the GEMA v. OpenAI case, a Munich court ruled that song lyrics were embedded in the model weights of GPT-4 and GPT-4o, and that this constituted copyright-relevant reproduction.

Model weights are the numerical values an AI model learns during training that determine what outputs it generates. For the Munich court, it was enough that a work could be reproduced from these parameters to justify claims for injunctive relief and damages.

The UK High Court reached the opposite conclusion in Getty Images v. Stability AI: an AI model is not an "infringing copy" because its weights neither contain nor reproduce copyrighted works. The court found that the weights only store learned patterns, not actual works.

Ad
DEC_D_Incontent-2

Meanwhile, a study by researchers at Stanford and Yale shows just how real the problem is - the team managed to extract entire books from leading AI models almost word for word.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Complaint | Reuters