OpenAI has revised its Preparedness Framework to better identify and manage risks associated with advanced AI models.
The updated framework shifts focus to capabilities that could cause “severe, novel, and irreversible harm,” and introduces a more systematic evaluation process using five specific criteria: plausible, measurable, severe, novel, and immediate or irreversible dangers. A capability is considered risk-relevant only if it meets all five.
First introduced in late 2023 by OpenAI’s Preparedness Team, the framework now categorizes capabilities into two main groups: Tracked Categories and Research Categories. Tracked Categories include known high-risk areas with established safety protocols, such as biological and chemical applications, cybersecurity, and AI self-improvement. In contrast, Research Categories refer to emerging risk areas with limited understanding or insufficient evaluation methods, including autonomous replication, sandbagging, and bypassing safety mechanisms. OpenAI is currently developing new threat models and assessment tools for these less-understood domains.
The framework also introduces two levels of AI capability: High Capability and Critical Capability. Systems classified as High Capability must be secured before deployment, while Critical Capability systems require safeguards during development. An internal Safety Advisory Group (SAG) evaluates whether existing measures are sufficient and provides recommendations, but final decisions are made by OpenAI’s management. If new data surfaces, existing safety assessments can be re-evaluated.
Scalable testing for faster model updates
As AI models are increasingly updated without full retraining, OpenAI aims to implement scalable assessments that can keep pace with rapid development cycles. These include automated testing procedures supplemented by in-depth reviews. OpenAI also reserves the option to adjust its requirements in response to external developments—such as the release of a high-risk model by another provider that lacks comparable safety protocols—provided that overall risk levels do not increase.
In addition to its existing Capabilities Reports, OpenAI will begin publishing Safeguards Reports. These will detail the safety measures in place and assess their effectiveness. Both types of reports are based on a “defense in depth” strategy and will inform decisions on model deployment. As before, OpenAI plans to release these reports publicly alongside each new model launch.
The updated framework comes shortly after reports that OpenAI has significantly shortened its safety review processes since the release of GPT-4. According to Chief of Preparedness Johannes Heidecke, automation has helped the company strike “a good balance.” For now, AI safety evaluations in the U.S. and U.K. remain voluntary, but the upcoming EU AI Act will introduce mandatory risk assessments for powerful models.