Microsoft faces a lawsuit alleging it used 200,000 pirated books to train AI

Jun 26, 2025

Microsoft is being sued by several authors who say their books were used without permission to train a Megatron model. The lawsuit, filed in federal court in New York, claims Microsoft used a dataset of about 200,000 pirated books to build a system that mimics the style, voice, and themes of the original works. The plaintiffs are asking for a ban on further use and up to $150,000 in damages per title.

Courts in similar cases involving Meta and Anthropic have said such use may qualify as "transformative" under fair use rules. But it is still unclear if using pirated books overrides fair use, or if scraping copyrighted content from the internet is considered legal and to which extent, and whether this harms the market for the original books, which could prevent the use from being considered fair use.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Microsoft faces a lawsuit alleging it used 200,000 pirated books to train AI

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.