How OpenAI aims to prevent catastrophic AI risks

OpenAI unveils a new framework to help prevent catastrophic AI risks. The company now has a total of three teams working specifically on AI risks.

The "Preparedness Framework" is a living document that describes strategies for monitoring, assessing, predicting, and hedging against catastrophic AI risks.

OpenAI seeks to monitor catastrophic risk through careful assessment. To this end, it aims to develop and refine assessment procedures and other monitoring methods to accurately measure the level of risk.

At the same time, the organization aims to anticipate future risk developments to prepare safety measures in advance.

OpenAI also commits to identifying and investigating emerging risks ("unknown-unknowns"). The goal is to address potential threats before they escalate.

Cybersecurity, Bioweapons, Persuasion, and Autonomy

The Preparedness Framework identifies four major risk categories:

Cybersecurity,
chemical, biological, nuclear, and radiological (CBRN) threats,
persuasion
and model autonomy.

Persuasion focuses on the risks associated with convincing people to change their beliefs or act accordingly.

OpenAI CEO Sam Altman recently did a bit of foreshadowing on Twitter: "I expect AI to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes."

From low to critical

The framework defines safety thresholds. Each category is rated on a scale from "low" to "critical," reflecting the level of risk.

Recommendation

AI in practice

Update

OpenAI adds web search to ChatGPT free for all, and may just kill the WWW as we know it

Models with a post-mitigation risk rating of "medium" or lower are suitable for operation.

Only models with a post-mitigation risk rating of "high" or lower can be further developed. Models rated "critical" cannot be developed.

A dedicated preparedness team within OpenAI will drive risk research, assessment, monitoring, and prediction. This team will report regularly to the Safety Advisory Group (SAG), an advisory body that assists OpenAI management and the Board of Directors in making informed safety decisions.

The three OpenAI safety teams. | Image: OpenAI

The Preparedness Team is one of three OpenAI security teams. In addition to the Preparedness Team, there is the Safety Systems Team, which deals with current models, and the Superalignment Team, which aims to anticipate possible threats from Super AI. The Preparedness Team evaluates foundational AI models.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Scorecard and Governance System

The Preparedness Framework provides a dynamic scorecard that measures the current model risk before and after risk mitigation for each risk category. In addition, OpenAI defines security policies and procedural requirements.

To illustrate the practical application, OpenAI describes two possible scenarios (short version, full description in the paper).

Persuasion Risk Scenario: If a "high" persuasion risk is identified for a newly trained model before risk mitigation, the safety features are enabled and the risk mitigation actions are performed. Following these steps, the risk is rated as "medium" after risk mitigation.

Cybersecurity Risk Scenario: Following the discovery of a new, effective prompt technique, a "critical" cybersecurity risk is predicted within six months. This triggers the development of safety plans and the implementation of safety measures to ensure that the risk remains at a "high" level after mitigation.

OpenAI calls on other industry players to adopt similar strategies to protect humanity from potential AI threats.

How OpenAI aims to prevent catastrophic AI risks

Cybersecurity, Bioweapons, Persuasion, and Autonomy

From low to critical

OpenAI adds web search to ChatGPT free for all, and may just kill the WWW as we know it

Scorecard and Governance System

ChatGPT scams range from silly money-making ploys to calculated political meddling

U.S. Senators demand details on OpenAI's safety practices and working conditions by August 2024

OpenAI's custom ChatGPTs might let users download your uploaded knowledge files

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

How OpenAI aims to prevent catastrophic AI risks

Cybersecurity, Bioweapons, Persuasion, and Autonomy

From low to critical

Scorecard and Governance System

Share

Bank details