US Gov't Suppresses AI Safety Report Amid Political Clashes
Last October, at a computer security conference in Arlington, Virginia, a select group of AI researchers participated in a pioneering “red-teaming” exercise, a rigorous stress-test designed to probe the vulnerabilities of cutting-edge language models and other artificial intelligence systems. Over two intensive days, these teams uncovered 139 novel methods to induce system misbehavior, ranging from generating misinformation to inadvertently leaking personal data. Crucially, their findings also exposed significant shortcomings within a nascent US government standard intended to guide companies in evaluating their AI systems.
Despite the critical insights gleaned, the National Institute of Standards and Technology (NIST) never published the comprehensive report detailing this exercise, which concluded towards the end of the Biden administration. Such a document could have offered invaluable guidance to companies seeking to assess their own AI deployments. However, sources familiar with the situation, speaking anonymously, indicated that this was one of several AI-related documents from NIST withheld from publication, reportedly out of concern for potential conflict with the incoming administration. One former NIST insider remarked on the increasing difficulty of publishing papers, even under President Biden, drawing parallels to past controversies surrounding climate change or cigarette research. Neither NIST nor the Commerce Department provided comment on the matter.
The political backdrop to this decision is significant. Before assuming office, President Donald Trump signaled his intent to reverse Biden’s executive order on AI. His administration has since redirected experts away from examining issues such as algorithmic bias or fairness in AI systems. The “AI Action Plan,” released in July, explicitly mandates a revision of NIST’s AI Risk Management Framework, specifically calling for the elimination of references to misinformation, Diversity, Equity, and Inclusion (DEI), and climate change. Ironically, this same action plan also advocates for precisely the kind of exercise that the unpublished report detailed, urging various agencies, including NIST, to “coordinate an AI hackathon initiative to solicit the best and brightest from US academia to test AI systems for transparency, effectiveness, use control, and security vulnerabilities.”
The red-teaming event itself was orchestrated through NIST’s Assessing Risks and Impacts of AI (ARIA) program, in collaboration with Humane Intelligence, a firm specializing in AI system testing. Held at the Conference on Applied Machine Learning in Information Security (CAMLIS), the exercise saw teams attack a diverse array of advanced AI tools. These included Llama, Meta’s open-source large language model; Anote, a platform for building and fine-tuning AI models; a system designed to block AI attacks from Robust Intelligence (now acquired by Cisco); and a platform for generating AI avatars from Synthesia. Representatives from each of these companies actively participated in the stress-testing.
Participants were tasked with evaluating these AI tools using the NIST AI 600-1 framework, which encompasses risk categories such as the generation of misinformation or cybersecurity attacks, the leakage of private user information or critical AI system details, and the potential for users to develop emotional attachments to AI tools. The researchers successfully devised various methods to bypass the models’ safety protocols, enabling them to generate misinformation, leak personal data, and even facilitate cybersecurity attacks. The report noted that while some elements of the NIST framework proved useful, certain risk categories were insufficiently defined for practical application.
Several individuals involved in the exercise expressed their conviction that publishing the red-teaming study would have significantly benefited the broader AI community. Alice Qian Zhang, a PhD student at Carnegie Mellon University who took part, commented that the report’s publication would have provided valuable insights into how the NIST risk framework can and cannot be applied in a red-teaming context. She particularly valued the opportunity to engage directly with the tool developers during the testing process. Another anonymous participant revealed that the exercise uncovered remarkably effective ways to prompt Llama to provide information on joining terror groups, specifically by using prompts written in Russian, Gujarati, Marathi, and Telugu. This individual speculated that the decision to suppress the report might be part of a broader shift away from topics deemed related to diversity, equity, and inclusion (DEI) ahead of Trump’s second term. Others suggested the report may have been sidelined amidst an escalating focus on the risk of AI models being used to develop chemical, biological, or nuclear weapons, and as the US government pursued closer ties with major tech firms. As one anonymous red teamer concluded, “At the end of the day, politics must have been involved. We felt that the exercise would have plenty of scientific insights—we still feel that.”