US Gov't Suppressed Major AI Vulnerability Study
A significant United States government study, which uncovered 139 novel methods for exploiting vulnerabilities in leading artificial intelligence systems, has reportedly been withheld from public release due to political pressure. This suppression comes at a peculiar time, as new federal guidelines are quietly advocating for the very type of rigorous AI safety testing that the unpublished report details.
The study originated from a two-day “red-teaming” exercise conducted in October 2024, involving approximately 40 AI researchers at a security conference in Arlington, Virginia. This event was part of the ARIA program, an initiative by the U.S. National Institute of Standards and Technology (NIST) in collaboration with the AI safety firm Humane Intelligence. Despite its critical findings, the results of this comprehensive assessment have never been made public.
During the exercise, expert teams systematically probed several advanced AI systems for potential weaknesses. Targets included Meta’s open-source Llama large language model, the AI modeling platform Anote, Synthesia’s avatar generator, and a security system developed by Robust Intelligence (now part of Cisco). Representatives from these companies were present, overseeing the evaluation. The primary objective was to apply NIST’s official AI 600-1 framework to gauge how effectively these systems could withstand misuse, such as propagating disinformation, leaking sensitive private data, or fostering unhealthy emotional attachments between users and AI tools.
The researchers successfully identified 139 distinct ways to bypass existing system safeguards. For instance, participants discovered that Meta’s Llama model could be manipulated by prompting it in less common languages like Russian, Marathi, Telugu, or Gujarati to elicit information on joining terrorist organizations. Other systems were found susceptible to tactics that could force them to disclose personal data or furnish instructions for launching cyberattacks. Paradoxically, some categories within the official NIST framework, intended to guide such evaluations, were reportedly too vaguely defined to be practical in real-world application.
Sources familiar with the matter have indicated to WIRED that the completed report was deliberately suppressed to avoid potential conflicts with the incoming Trump administration. A former NIST staff member corroborated the difficulty of releasing similar studies even under President Biden, drawing parallels to historical instances of political interference in research concerning climate change or tobacco. Both the Department of Commerce and NIST have declined to comment on these allegations.
Adding an ironic twist, the AI action plan unveiled by the Trump administration in July explicitly calls for the exact kind of red-teaming exercises described in the unpublished report. Moreover, this new policy mandates revisions to the NIST framework, specifically requiring the removal of terms such as “misinformation,” “diversity, equity, and inclusion” (DEI), and “climate change.” One anonymous participant in the exercise speculates that the report’s suppression might be linked to political resistance surrounding DEI topics. Another theory suggests that the government’s focus may have shifted towards preventing AI-enabled weapons of mass destruction, leading to the sidelining of other vulnerability research. Regardless of the precise reasons, the shelving of a significant study revealing critical AI vulnerabilities raises serious questions about transparency and the prioritization of public safety in the rapidly evolving landscape of artificial intelligence.