Content
summary Summary

OpenAI is pushing back against a demand from the New York Times to search through 120 million ChatGPT user conversations as part of an ongoing copyright lawsuit with the newspaper. The company has offered access to 20 million chat logs—a fraction of what the Times wants.

Ad

The Times is seeking a broad review of ChatGPT outputs to look for potential copyright violations involving its articles. It also wants to document how such incidents may have changed over the 23-month period in question.

OpenAI warns of privacy and technical risks

OpenAI says that scanning the full set of chat data would involve major technical and privacy risks. The chat logs are unstructured, often containing over 5,000 words, and may include sensitive details like addresses or passwords.

Before sharing, these logs would need to be carefully scrubbed. OpenAI estimates that preparing the 20 million offered logs would take about twelve weeks and handling all 120 million would require roughly 36 weeks.

Ad
Ad

The company says the data must be pulled from an offline system and processed manually, requiring significant staff and technical resources. OpenAI also warns that keeping logs for longer periods—especially deleted chats—would create new risks of data breaches.

The Times has rejected the proposed limit of 20 million logs, insisting on full access to demonstrate not just isolated cases, but systematic copyright violations and any trends over time.

OpenAI, in turn, cites computer scientist Taylor Berg-Kirkpatrick, who says a 20 million sample is statistically valid. The company argues that expanding the search would be disproportionate and unnecessarily slow the case.

Court orders preservation of deleted data

This latest dispute follows a June 2025 court order requiring OpenAI to retain all ChatGPT conversations—including deleted ones—after the Times and other publishers accused the company of destroying evidence through automated deletion.

OpenAI called the order a serious invasion of privacy for hundreds of millions of users. In court, the company argued that many chats contain "deeply personal" information, including financial data and private matters like wedding planning. Business customers using the API to process sensitive corporate data are also affected. The order, OpenAI says, forces it to violate its own privacy policies and undermines users' trust.

Recommendation

OpenAI also disputes the allegation that evidence is being destroyed. The company says there is no proof that infringing content was deliberately deleted, whether automatically or manually, and calls the idea that users are mass-deleting chats to hide legal risks speculative. Still, the judge found reason to believe that evidence could be lost through deletion and ordered comprehensive data preservation as a precaution.

News of the decision spread quickly on social media, sparking concern among users. On LinkedIn and X (formerly Twitter), experts warned of new security risks and advised against sharing sensitive data with ChatGPT. Some companies even viewed the order as a potential breach of contract by OpenAI, since confidential data would now be stored longer and possibly exposed to third parties.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI is resisting the New York Times' demand to search 120 million ChatGPT user conversations for possible copyright violations, offering access to 20 million logs instead and arguing that a broader search poses major privacy and technical risks.
  • The company says preparing the requested logs would require extensive manual processing to remove sensitive information, taking months of work and raising concerns about storing deleted or private data, while the Times insists on full access to show potential systematic copyright issues.
  • A recent court order requires OpenAI to preserve all ChatGPT conversations, including deleted ones, following accusations of evidence destruction—a decision that has raised privacy concerns among users and prompted warnings from experts about sharing sensitive information with the AI service.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.