Microsoft is taking a new approach to using copyrighted books for AI training by offering payment to HarperCollins authors. The deal sheds light on how the industry values creative work in the AI era.
The company has proposed a licensing agreement with publisher HarperCollins that would pay $5,000 per book for AI training rights. Authors would receive half of that amount, or $2,500 per book, according to the publisher.
Alice Robb, who covered the story for Bloomberg, received the same HarperCollins offer for her 2018 book "Why We Dream." The deal gives Microsoft a three-year training license, with authors free to accept or decline.
But putting a price tag on these rights isn't simple. "My first impulse was to outsource the decision to my agent, but she demurred," Robb writes. The contract had no precedent or room for negotiation, and she had just one week to choose.
Robb ultimately took the deal, though she's unsure it was the right call. "As far as I can tell, neither does anyone else," she writes, noting that her eight-year-old book has already been used to train AI systems without permission anyway - likely by Microsoft or OpenAI.
The decision becomes even harder given authors' financial struggles, Robb notes. The Authors Guild reports that full-time authors' median annual income is just $20,000. In the UK, professional writers earn a median of £7,000 (about €8,400) yearly.
From piracy to payment
Brown University economist Emily Oster sees Microsoft's approach as calculated: "They’re trying to establish the idea that the rights to train on books are worth $5,000. You can’t do that by going to the latest bestseller. So you do that by going to the backlist — to people who aren’t collecting royalties — and telling them, ‘Look, would you like some free money?’"
While Microsoft is seeking licenses in this case, other AI companies claim that "fair use" allows them to train AI on copyrighted works without payment. They argue that transforming existing data into new products supersedes copyright law. Authors, publishers, and artists disagree, leading to multiple lawsuits.
Meta recently showed how ruthlessly AI companies collect training data. Court documents revealed that despite internal warnings, the company deliberately used piracy networks to download copyrighted books for AI training and systematically removed copyright notices.
Microsoft's and OpenAI's move toward licensing suggests that the big AI labs may be backing away from their stance that using copyrighted content without permission is legal, and taking a more thoughtful approach. Some AI labs are even buying second-tier video content from YouTube creators.