Political pressure reportedly kept a major AI vulnerability study under wraps

A U.S. government study uncovered 139 new ways to break top AI systems—then vanished from public view. Political pressure reportedly kept the findings under wraps, even as new federal guidelines quietly demand the very testing that was suppressed.

In October of last year, around 40 AI researchers took part in a two-day red-teaming exercise at a security conference in Arlington, Virginia. The event was part of the ARIA program run by the U.S. National Institute of Standards and Technology (NIST), in collaboration with the AI safety company Humane Intelligence. Despite its significance, the study's results have never been made public - reportedly for political reasons.

139 new ways to bypass AI safeguards

During the exercise, which took place at the CAMLIS conference on applied machine learning for information security, teams probed several advanced AI systems for vulnerabilities. Targets included Meta's open-source Llama LLM, the AI modeling platform Anote, Synthesia's avatar generator, and a security system from Robust Intelligence (now part of Cisco). Company representatives were also present.

The goal was to use NIST's official AI 600-1 framework to evaluate how well these systems could defend against misuse - such as spreading disinformation, leaking private data, or fostering unhealthy emotional bonds between users and AI tools.

Participants uncovered 139 new methods for circumventing system safeguards. For example, Llama could be prompted in Russian, Marathi, Telugu, or Gujarati to share information about joining terrorist groups. Other systems could be manipulated to disclose personal data or provide guidance for cyberattacks. Some categories in the official NIST framework were reportedly too vaguely defined to be useful in practice.

Political pressure blocked the report

The completed report was never published. Several people familiar with the matter told WIRED that the findings were withheld to avoid conflict with the incoming Trump administration. Even under President Biden, releasing similar studies had been "very difficult," according to a former NIST staffer, who compared the situation to past political suppression of research on climate change or tobacco.

The Department of Commerce and NIST declined to comment.

Trump’s AI plan matches what was suppressed

Ironically, the AI action plan released by the Trump administration in July calls for exactly the kind of red-teaming described in the unpublished report. The document also mandates revisions to the NIST framework, including the removal of terms like "misinformation," "diversity, equity, and inclusion" (DEI), and "climate change."

One anonymous participant in the exercise suspects the report was deliberately suppressed because of political resistance to DEI topics. Another theory is that the government has shifted its focus to preventing AI-enabled weapons of mass destruction.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI and society

Political pressure reportedly kept a major AI vulnerability study under wraps

139 new ways to bypass AI safeguards

Political pressure blocked the report

Trump’s AI plan matches what was suppressed

US think tank warns of "reverse brain drain" in China's AI sector

AMD signs a long-term deal to supply OpenAI with multiple generations of Instinct GPUs

OpenAI’s “Hacktivate AI” report urges Europe to cut red tape and harmonize digital regulations

More than half of all venture capital money will flow into AI start-ups in 2025

OpenAI suddenly remembers that copyright law exists after a few days of wild Sora videos

OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app

Deepmind says video models for visual tasks could become what LLMs are for text tasks

Political pressure reportedly kept a major AI vulnerability study under wraps

139 new ways to bypass AI safeguards

Political pressure blocked the report

Trump’s AI plan matches what was suppressed

Share

Bank details