AI in practice

Aug 10, 2024Aug 10, 2024

Anthropic tests its "next-generation system for AI safety mitigations"

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.

Profile

E-Mail

Anthropic is expanding its bug bounty program to test its "next-generation system for AI safety mitigations." The program focuses on identifying and defending against "universal jailbreak attacks." Anthropic is prioritizing critical vulnerabilities in high-risk areas like chemical, biological, radiological and nuclear (CBRN) defense and cybersafety. Participants get early access to Anthropic's latest safety systems before public release. Their task is to find vulnerabilities or ways to bypass safety measures. Anthropic is offering rewards up to $15,000 for discovering new universal jailbreak attacks.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:

Bank transfer

Sources

Anthropic

Matthias Bastian

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.

Profile

E-Mail

AI and society

Oct 7, 2025Oct 7, 2025

Anthropic launches Petri, an open-source tool for automated AI model safety audits

News, tests and reports about VR, AR and MIXED Reality.

What happens next with MIXED My personal farewell to MIXED Meta and Anduril are now jointly developing XR headsets for the US military MIXED-NEWS.com

AI and society

Aug 28, 2025Aug 28, 2025

Anthropic teams up with OpenAI for security tests and warns that AI is enabling cybercrime

AI research

Jul 23, 2025Jul 23, 2025

Anthropic says that AI can learn risky behaviors even when the training data looks completely safe

Google News

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Anthropic tests its "next-generation system for AI safety mitigations"

Anthropic launches Petri, an open-source tool for automated AI model safety audits

Anthropic teams up with OpenAI for security tests and warns that AI is enabling cybercrime

Anthropic says that AI can learn risky behaviors even when the training data looks completely safe

OpenAI restructures under new foundation, Microsoft takes 27 percent stake

ChatGPT's memory could turn personal details into ads OpenAI CEO Altman once called dystopian

The long-predicted deepfake dystopia has arrived with Sora 2

Anthropic tests its "next-generation system for AI safety mitigations"

Anthropic launches Petri, an open-source tool for automated AI model safety audits

Anthropic teams up with OpenAI for security tests and warns that AI is enabling cybercrime

Anthropic says that AI can learn risky behaviors even when the training data looks completely safe