Reddit reportedly signs $60 million annual training data deal with Google

Feb 22, 2024

DALL-E 3 prompted by THE DECODER

Updated on February 22, 2024:

The AI company licensing Reddit data is apparently Google. This was reported by Reuters, citing anonymous sources. Reuters confirms the license fee of 60 million dollars per year, although it is unclear to what extent and what data Reddit will provide in return.

Original article from February 17, 2024:

Reddit has signed a $60 million annual contract with an unnamed AI company to use the platform's content to train its AI models.

According to Bloomberg, Reddit disclosed this in advance to potential investors, who are expected to support its planned IPO with a valuation of at least five billion US dollars. The deal shows how Reddit can capitalize on the current interest in AI training data.

Other social media platforms could also sell their user content in this way and generate additional revenue. Meta and X use their social media data to train their own AI models.

Many assume that Reddit plays a central role in the training of large language models such as OpenAI's GPT-3.5 or GPT-4, Meta's LLaMa, or Google's models.

This is because many Reddit posts already carry a human rating thanks to the platform's upvote and downvote function, which facilitates pre-sorting. The posts also contain additional contextual links. Both of these factors make the data valuable to AI companies.

"The Reddit corpus of data is really valuable. But we don’t need to give all of that value to some of the largest companies in the world for free," said Reddit co-founder Steve Huffman in the spring of 2023.

At the time, Reddit announced that it would start charging companies that wanted to access user data through its API. Previous models were trained on Reddit data for free. These rising licensing costs for training future AI models affect other text sources in addition to Reddit.

AI companies are increasingly partnering with publishers to get data to train their models. OpenAI, for example, has confirmed a deal with Axel Springer that includes making Springer news available on ChatGPT. More deals will follow, the company said. Apple and Google are also said to be offering licensing deals to publishers.

Meta explained in a submission to the US Copyright Office that training AI on purely licensed material would be prohibitively expensive on the scale required. OpenAI also told the UK government that the development of leading AI models is not possible without training on licensed material.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Reddit reportedly signs $60 million annual training data deal with Google

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.