OpenAI resists NYT demand for 120M ChatGPT logs in copyright suit

Decoder

OpenAI is embroiled in a significant legal dispute with The New York Times, marked by a contentious battle over access to millions of ChatGPT user conversations. At the heart of the latest disagreement is the newspaper’s demand to search through an unprecedented 120 million chat logs as part of its ongoing copyright lawsuit against the artificial intelligence giant. OpenAI, however, is pushing back, offering a significantly smaller subset of 20 million logs instead.

The Times seeks this extensive data to unearth potential copyright infringements involving its articles and to meticulously document how such incidents may have evolved over a 23-month period. OpenAI, on the other hand, warns that acceding to the newspaper’s sweeping request poses substantial technical and privacy risks. The company explains that these chat logs are largely unstructured, often exceeding 5,000 words each, and frequently contain highly sensitive personal information, including addresses and passwords.

Before any data could be shared, these logs would require meticulous manual review and redaction to remove sensitive details. OpenAI estimates that preparing even the 20 million logs it has offered would take approximately twelve weeks, while processing the full 120 million would consume roughly 36 weeks. This labor-intensive process, the company notes, would demand significant staff and technical resources, as the data must be pulled from an offline system. OpenAI also highlights that retaining deleted chats for extended periods, as implied by the Times’ demand, could create new vulnerabilities to data breaches.

The Times has firmly rejected OpenAI’s proposed limit, asserting that a smaller sample would be insufficient to demonstrate systematic copyright violations and long-term trends, insisting on comprehensive access to build its case. In response, OpenAI cites computer scientist Taylor Berg-Kirkpatrick, who supports the statistical validity of a 20 million-log sample. The AI company argues that expanding the search beyond this would be disproportionate and unnecessarily prolong the legal proceedings.

This current disagreement unfolds against the backdrop of a significant court order issued in June 2025. This order mandated that OpenAI preserve all ChatGPT conversations, including those users had deleted. The directive followed accusations from the Times and other publishers that OpenAI was destroying evidence through automated deletion processes.

OpenAI vehemently criticized this order, describing it as a severe invasion of privacy for hundreds of millions of users. The company argued in court that many chats contain “deeply personal” information, ranging from financial data to private matters like wedding planning. Furthermore, business customers utilizing OpenAI’s API to process sensitive corporate data are also impacted. OpenAI contends that the order forces it to violate its own privacy policies and fundamentally erodes user trust.

While the judge found reason to believe that evidence could be lost through deletion and ordered comprehensive data preservation as a precaution, OpenAI disputes the allegation of deliberate evidence destruction. The company maintains there is no proof that infringing content was intentionally deleted, whether automatically or manually, and dismisses the notion of users mass-deleting chats to conceal legal risks as speculative.

News of the court’s decision quickly reverberated across social media platforms, triggering widespread concern among users. Experts on LinkedIn and X (formerly Twitter) issued warnings about new security risks and advised against sharing sensitive data with ChatGPT. Some companies even interpreted the order as a potential breach of contract by OpenAI, fearing that confidential data would now be stored longer and potentially exposed to third parties.