OpenAI establishes a new "Preparedness Team" to prevent potentially catastrophic risks from frontier AI models.
In a new blog post, OpenAI acknowledges the potential benefits of frontier AI models, which could far exceed the capabilities of existing systems. But it also recognizes the serious risks these models pose.
OpenAI's stated goal is to develop AGI that is, at a minimum, a human-like intelligent machine capable of rapidly acquiring new knowledge and generalizing across domains.
To address these risks, OpenAI aims to answer questions such as
- How dangerous is the misuse of frontier AI systems today and in the future?
- How can we develop a robust framework for monitoring, evaluating, predicting, and protecting against the dangerous capabilities of frontier AI systems?
- If frontier AI model weights were stolen, how could malicious actors use them?
The answers to these questions could help ensure the safety of advanced AI systems. The new team comes after OpenAI, along with other leading labs, made voluntary commitments to advance safety and trust in AI through the industry organization Frontier Model Forum.
The announcement of the new safety team precedes the first AI Safety Conference, to be held in the UK in early November.
OpenAI's new Preparedness Team
The Preparedness Team, led by Aleksander Madry, will focus on performance assessment, evaluation, and internal red teaming of Frontier models.
Its mission spans multiple risk categories, including individual persuasion, cybersecurity, chemical, biological, radiological, and nuclear (CBRN) threats, and autonomous replication and adaptation (ARA).
The team will also work to develop and maintain a Risk-Informed Development Policy (RDP) to establish a governance structure for accountability and oversight throughout the development process.
The RDP is intended to complement and extend OpenAI's existing work on risk mitigation and contribute to the safety and alignment of new high-performance systems before and after implementation.
In addition to building the team, OpenAI is launching an AI Preparedness Challenge to prevent catastrophic misuse. The challenge will award $25,000 worth of API credits to up to ten top submissions, and OpenAI will publish innovative ideas and contributions. The Lab will also seek candidates for the Preparedness Team from the top Challenge applicants.