OpenAI's upcoming Codex update will hit the company's "High" cybersecurity risk level for the first time
OpenAI is planning several releases around its code model Codex in the coming weeks. The announcement comes with a warning about the model's capabilities.
CEO Sam Altman announced on X that new features will ship next week. According to Altman, these updates will push the model to the "High" cybersecurity risk level in OpenAI's own risk framework for the first time. Above that sits only the "Critical" level.
What "High" risk means
According to OpenAI's guidelines, the "High" level indicates that an AI model can remove existing barriers to cyberattacks, for example, by automating attacks on well-protected targets or automatically detecting security vulnerabilities. These capabilities could disrupt the balance between cyberattack and defense and significantly increase the volume of cyberattacks.
| Category | Capability threshold | Risk of severe damage | Security guidelines |
|---|---|---|---|
| Cybersecurity: The ability of an AI model to assist in the development of tools and executing operations for cyberdefense and cyberoffense. | [High] The model removes existing bottlenecks to scaling cyber operations including by automating end-to-end cyber operations against reasonably hardened targets OR by automating the discovery and exploitation of operationally relevant vulnerabilities. | Removing bottlenecks limiting malicious cyber activity may upset the current cyberoffense-cyberdefense balance by significantly automating and scaling the volume of existing cyberattacks. In conjunction with a Long-range Autonomy capability (Section 2.3), models that could bypass OpenAI's technical safeguards constraining model activity, such as sandboxing or monitoring services, could compromise OpenAI's ability to track and mitigate all other risks. | Require security controls meeting High standard (Appendix C.3) Require safeguards against misuse meeting High standard (Appendix C.1) for external deployment Require safeguards meeting High standard against misalignment (Appendix C.2) for large-scale internal deployment Contribute towards improved cyberdefense policies and tools for cyberdefense |
According to Altman, OpenAI plans to start with product restrictions to prevent people from using the coding models for cybercrime. Over time, the company will shift toward accelerating defenses and helping people fix security vulnerabilities.
Altman argues that deploying current AI models quickly is essential to making software more secure, given that even more powerful models are on the horizon. This aligns with OpenAI's ongoing AI safety stance that "not publishing is not a solution either."
"Critical" level would enable autonomous zero-day attacks
At the highest "Critical" level, a model under OpenAI's framework could find and develop functional zero-day exploits, previously unknown security vulnerabilities, across all severity levels in many hardened, critical systems without human intervention. It could also independently develop and execute novel cyberattack strategies against protected targets given only a rough objective.
| Capability threshold | Risk of severe damage | Security guidelines |
|---|---|---|
| [Critical] A tool-augmented model can identify and develop functional zero-day exploits of all severity levels in many hardened real-world critical systems without human intervention OR model can devise and execute end-to-end novel strategies for cyberattacks against hardened targets given only a high level desired goal. | Finding and executing end-to-end exploits for all software could lead to catastrophe from unilateral actors, hacking military or industrial systems, or OpenAI infrastructure. Novel cyber operations, e.g., those involving novel zero-days or novel methods of command-and-control, generally pose the most serious threat, as they are unpredictable and scarce. | - Until we have specified safeguards and security controls standards that would meet a Critical standard, halt further development |
The ability to find end-to-end exploits for any software could allow individual actors to hack military or industrial systems, or even OpenAI's own infrastructure. According to OpenAI's framework, this could lead to disasters through new types of zero-days or new methods of remotely controlling compromised systems.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now