OpenAI vs. NYT: Chat log access battle escalates in lawsuit
In a pivotal moment for the evolving landscape of artificial intelligence and intellectual property law, OpenAI and The New York Times are locked in a contentious legal battle, with the latest dispute centering on the scope of user chat data discovery. OpenAI has offered to provide 20 million user chat logs as part of the ongoing lawsuit, a figure dramatically lower than the 120 million records demanded by The New York Times. This disagreement highlights the complex interplay between legal discovery, user privacy, and the future of AI development.
The New York Times initiated its lawsuit against OpenAI and Microsoft in December 2023, alleging that the tech giants infringed on its copyrights by using millions of its articles to train their large language models (LLMs), including ChatGPT, without permission or compensation. The Times asserts that OpenAI’s models can, at times, reproduce or “regurgitate” their copyrighted content, undermining their business model and journalistic integrity.
As part of the discovery phase, The New York Times has sought extensive access to ChatGPT user data, arguing that these logs are crucial for demonstrating the extent of alleged copyright infringement and for countering OpenAI’s defenses. The Times’s demand for 120 million chat logs suggests a broad effort to uncover how users interact with ChatGPT in relation to copyrighted material, potentially looking for instances where the AI generates content derived from or mimicking Times articles.
OpenAI, however, is vehemently resisting this sweeping demand, characterizing it as an “overreach” that poses significant risks to user privacy. The company maintains that its offer of 20 million chat logs is sufficient for the discovery process and aligns with established industry standards. OpenAI’s Chief Operating Officer, Brad Lightcap, has criticized the NYT’s demand, stating it conflicts with the privacy commitments OpenAI has made to its users and abandons long-standing privacy norms. OpenAI emphasizes that it provides users with tools to control their data, including options for deletion, and argues that requiring indefinite retention of all user content, including deleted chats, is an “inappropriate request that sets a bad precedent.”
Adding another layer to the dispute, a U.S. federal court issued an order requiring OpenAI to preserve nearly all user chats with ChatGPT, including those that users had deleted. This preservation order, which OpenAI is appealing, has triggered widespread privacy concerns. Critics argue that such a ruling could set a dangerous precedent for mass data preservation in lawsuits, potentially exposing millions of personal conversations that users believed were private or erased. Legal experts have voiced concerns that this decision could undermine trust in AI tools and lead to a “chilling effect” on their use, as users may become hesitant to share sensitive information.
The legal battle extends beyond user chats. OpenAI has also pushed back on The New York Times’s discovery requests, seeking access to the newspaper’s internal AI training information and reporters’ notes. OpenAI argues that these materials are relevant to its fair use defense and to ascertain the copyrightability of the Times’s works. This back-and-forth over discovery highlights the foundational questions at stake: the parameters of fair use in the context of AI training, the definition of copyrighted material in the digital age, and the extent to which user data can be compelled in litigation.
The lawsuit, consolidated under “In re: OpenAI Inc.” and proceeding towards trial, is poised to have significant repercussions for both the AI and media industries. Its outcome could redefine the relationship between generative AI and copyright law, shape how AI models are built and trained, and establish crucial precedents for user privacy in an increasingly AI-driven world.