Ad
Skip to content

Reddit sues Anthropic for scraping site content to train Claude

Image description
GPT-Image-1 prompted by THE DECODER

At a Glance

  • Reddit has taken legal action against Anthropic, alleging that the company systematically collected and used content from subreddits to train its AI models without authorization.
  • The complaint states that Anthropic violated Reddit's terms of use, technical safeguards, and failed to secure a license, putting both economic interests and the privacy of deleted or sensitive user data at risk.
  • Reddit is seeking damages, the removal of all AI models trained with Reddit data, and a stop to Anthropic's further commercial use of these systems.

Reddit has filed a lawsuit against Anthropic in Superior Court in San Francisco, accusing the AI startup of systematically scraping Reddit posts to train its Claude language models without permission.

According to the platform's user agreement, commercial use of Reddit content requires an explicit license. Reddit says Anthropic ignored that rule, bypassed technical safeguards like robots.txt files and IP-based rate limits, and never connected to Reddit's compliance API—the tool that tells licensees when a user deletes a post so it can be removed from their systems.

According to the lawsuit, Anthropic has publicly acknowledged using Reddit data in past research and even listed more than 40 subreddits—including r/science, r/IAmA, and r/relationship_advice—as "high-quality" sources for training Claude. Reddit says Anthropic gathered that data without consent and despite those protective measures.

The lawsuit states an Anthropic spokesperson claimed in July 2024 that Reddit had been on ClaudeBot's blocklist since May. Reddit's internal logs tell a different story, showing more than 100,000 hits from Anthropic bots on Reddit's servers in the months after that claim.

Ad
DEC_D_Incontent-1

Reddit seeks injunction and damages

Reddit's lawsuit accuses Anthropic of multiple legal violations, from breach of contract to unfair competition. The platform is seeking damages for lost licensing revenue, demanding Anthropic delete all AI models and datasets containing Reddit content, and asking the court to prevent Anthropic from commercially using Claude or any AI models trained on Reddit data.

Reddit argues that Anthropic's actions threaten both the company's business interests and its users' privacy. Without a license or a connection to the compliance API, there is no way to confirm whether deleted or sensitive posts are still embedded in Claude.

"If parties such as Anthropic scrape Reddit content without a licensing agreement, Reddit users enjoy none of the protections present in Reddit’s Public Content Policy and Privacy Policy, in part, because Reddit users have no way of knowing which parties have scraped and obtained their data," the lawsuit states.

The platform notes that other AI companies have chosen a different path. Google reportedly pays Reddit $60 million a year for training data, and the partnership has given Reddit a boost in Google Search visibility in recent months.

Ad
DEC_D_Incontent-2

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Reddit Inc