Content
summary Summary

OpenAI unveils a new framework to help prevent catastrophic AI risks. The company now has a total of three teams working specifically on AI risks.

The "Preparedness Framework" is a living document that describes strategies for monitoring, assessing, predicting, and hedging against catastrophic AI risks.

OpenAI seeks to monitor catastrophic risk through careful assessment. To this end, it aims to develop and refine assessment procedures and other monitoring methods to accurately measure the level of risk.

At the same time, the organization aims to anticipate future risk developments to prepare safety measures in advance.

Ad
Ad

OpenAI also commits to identifying and investigating emerging risks ("unknown-unknowns"). The goal is to address potential threats before they escalate.

Cybersecurity, Bioweapons, Persuasion, and Autonomy

The Preparedness Framework identifies four major risk categories:

  • Cybersecurity,
  • chemical, biological, nuclear, and radiological (CBRN) threats,
  • persuasion
  • and model autonomy.

Persuasion focuses on the risks associated with convincing people to change their beliefs or act accordingly.

OpenAI CEO Sam Altman recently did a bit of foreshadowing on Twitter: "I expect AI to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes."

From low to critical

The framework defines safety thresholds. Each category is rated on a scale from "low" to "critical," reflecting the level of risk.

Recommendation

Models with a post-mitigation risk rating of "medium" or lower are suitable for operation.

Only models with a post-mitigation risk rating of "high" or lower can be further developed. Models rated "critical" cannot be developed.

A dedicated preparedness team within OpenAI will drive risk research, assessment, monitoring, and prediction. This team will report regularly to the Safety Advisory Group (SAG), an advisory body that assists OpenAI management and the Board of Directors in making informed safety decisions.

The three OpenAI safety teams. | Image: OpenAI

The Preparedness Team is one of three OpenAI security teams. In addition to the Preparedness Team, there is the Safety Systems Team, which deals with current models, and the Superalignment Team, which aims to anticipate possible threats from Super AI. The Preparedness Team evaluates foundational AI models.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Scorecard and Governance System

The Preparedness Framework provides a dynamic scorecard that measures the current model risk before and after risk mitigation for each risk category. In addition, OpenAI defines security policies and procedural requirements.

Image: OpenAI

To illustrate the practical application, OpenAI describes two possible scenarios (short version, full description in the paper).

Persuasion Risk Scenario: If a "high" persuasion risk is identified for a newly trained model before risk mitigation, the safety features are enabled and the risk mitigation actions are performed. Following these steps, the risk is rated as "medium" after risk mitigation.

Cybersecurity Risk Scenario: Following the discovery of a new, effective prompt technique, a "critical" cybersecurity risk is predicted within six months. This triggers the development of safety plans and the implementation of safety measures to ensure that the risk remains at a "high" level after mitigation.

OpenAI calls on other industry players to adopt similar strategies to protect humanity from potential AI threats.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI is developing a "Preparedness Framework" to prevent catastrophic risks from AI, based on five strategic pillars such as monitoring, predicting, and identifying risks.
  • The framework defines safety thresholds and risk categories, including cybersecurity, chemical and biological threats, persuasion, and model autonomy, rated on a scale from "low" to "critical."
  • A dedicated Preparedness Team will drive risk research, assessment, monitoring, and forecasting, while a Safety Advisory Group (SAG) will support informed safety decisions.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.