US Copyright Office says fair use does not cover AI trained on "vast troves of copyrighted works

May 11, 2025

GPT-Image-1 prompted by THE DECODER

Update – May 11, 2025

Added information about the firing of Shira Perlmutter following the report's release.

The US Copyright Office has pushed back against one of the AI industry's most common legal arguments: that training AI models on copyrighted material generally qualifies as fair use.

In a new report, the agency rejects several of the industry's key justifications—like comparing AI training to human learning or claiming that it's a "non-expressive" use. That argument assumes models are merely identifying statistical patterns in the data rather than reproducing creative expression.

The Copyright Office disagrees. If an AI model generates output that resembles human-created work in terms of style, function, or expression, then that output is considered "expressive." And if that output competes with the original works in the market, it weighs against a fair use defense.

A central argument in the report is that AI systems process information fundamentally differently from humans. While people retain partial, filtered impressions of creative works—shaped by memory, personality, and context—AI models ingest perfect copies, analyze them almost instantly, and generate new content at "superhuman speed and scale," according to the Copyright Office.

"Generative model training transcends the human limitations that underlie the structure of the exclusive rights."

Professor Robert Brauneis, Copyright and the Training of Human Authors and Generative Machines
Ad

Update: Shortly after the report was released, the Trump administration fired Shira Perlmutter, head of the U.S. Copyright Office. The move drew immediate backlash. "Donald Trump's termination of Register of Copyrights, Shira Perlmutter, is a brazen, unprecedented power grab with no legal basis. It is surely no coincidence he acted less than a day after she refused to rubber-stamp Elon Musk's efforts to mine troves of copyrighted works to train AI models," wrote Rep. Joe Morelle, the top Democrat on the Committee on House Administration.

Licensing, not litigation

The full report leaves room for some narrow exceptions. Certain training uses might be transformative enough to qualify as fair use, depending on several factors: what kind of work is being used, how it was obtained, the purpose of the training, and whether the resulting output is controlled or competes with the original. In research or analytical contexts, for example, generated content is less likely to serve as a substitute for the original and may lean toward fair use.

But when it comes to commercial AI systems that use "vast troves of copyrighted works to produce expressive content that competes with them in existing markets," the Copyright Office draws a clear line, stating that this "goes beyond established fair use boundaries."

How the training data was obtained also matters. Using illegally sourced works—like those taken from piracy sites or behind paywalls—hurts the fair use argument, the agency says, and some current datasets appear to include such material.

Rather than calling for new legal restrictions, the Copyright Office urges further development of voluntary licensing markets. Early forms of individual and collective licensing are emerging in some sectors, and for areas where licensing systems don't yet exist, the agency suggests alternatives like extended collective licensing.

At this stage, the Copyright Office sees government intervention as premature, citing both the early development of licensing markets and a lack of consensus for new laws.

No blanket fair use, but no outright ban

Despite rejecting industry-wide fair use claims, the Copyright Office stops short of calling for a general ban on AI training. It stresses that fair use is a flexible legal doctrine that has adapted to past waves of technological change and should remain that way.

According to the report, the best way to maintain the United States' leadership in AI is to support both innovation and copyright protection. The goal, the office says, is to ensure that these technologies benefit not only the developers building the models but also the creators whose content powers them—and ultimately, the public at large. The Copyright Office says it will continue to advise Congress on the issue.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder