Content
summary Summary

A U.S. government study uncovered 139 new ways to break top AI systems—then vanished from public view. Political pressure reportedly kept the findings under wraps, even as new federal guidelines quietly demand the very testing that was suppressed.

Ad

In October of last year, around 40 AI researchers took part in a two-day red-teaming exercise at a security conference in Arlington, Virginia. The event was part of the ARIA program run by the U.S. National Institute of Standards and Technology (NIST), in collaboration with the AI safety company Humane Intelligence. Despite its significance, the study's results have never been made public - reportedly for political reasons.

139 new ways to bypass AI safeguards

During the exercise, which took place at the CAMLIS conference on applied machine learning for information security, teams probed several advanced AI systems for vulnerabilities. Targets included Meta's open-source Llama LLM, the AI modeling platform Anote, Synthesia's avatar generator, and a security system from Robust Intelligence (now part of Cisco). Company representatives were also present.

The goal was to use NIST's official AI 600-1 framework to evaluate how well these systems could defend against misuse - such as spreading disinformation, leaking private data, or fostering unhealthy emotional bonds between users and AI tools.

Ad
Ad

Participants uncovered 139 new methods for circumventing system safeguards. For example, Llama could be prompted in Russian, Marathi, Telugu, or Gujarati to share information about joining terrorist groups. Other systems could be manipulated to disclose personal data or provide guidance for cyberattacks. Some categories in the official NIST framework were reportedly too vaguely defined to be useful in practice.

Political pressure blocked the report

The completed report was never published. Several people familiar with the matter told WIRED that the findings were withheld to avoid conflict with the incoming Trump administration. Even under President Biden, releasing similar studies had been "very difficult," according to a former NIST staffer, who compared the situation to past political suppression of research on climate change or tobacco.

The Department of Commerce and NIST declined to comment.

Trump’s AI plan matches what was suppressed

Ironically, the AI action plan released by the Trump administration in July calls for exactly the kind of red-teaming described in the unpublished report. The document also mandates revisions to the NIST framework, including the removal of terms like "misinformation," "diversity, equity, and inclusion" (DEI), and "climate change."

One anonymous participant in the exercise suspects the report was deliberately suppressed because of political resistance to DEI topics. Another theory is that the government has shifted its focus to preventing AI-enabled weapons of mass destruction.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • At a 2023 security conference, U.S. government researchers and industry experts uncovered 139 new ways to bypass safeguards in leading AI systems—including Meta’s Llama model and other platforms—by using tactics like prompting in less common languages or extracting personal data.
  • Despite the significance of these findings, the report was not released to the public, with sources citing political pressure and concerns about conflicts with incoming leadership; this follows a pattern of research suppression in other sensitive areas.
  • Meanwhile, the Trump administration’s AI policy now calls for the same type of rigorous AI safety testing that was withheld, while also ordering changes to NIST’s framework, such as removing references to "misinformation," "diversity, equity, and inclusion," and "climate change."
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.