OpenAI's newest safety team prepares for catastrophic AI risks

OpenAI establishes a new "Preparedness Team" to prevent potentially catastrophic risks from frontier AI models.

In a new blog post, OpenAI acknowledges the potential benefits of frontier AI models, which could far exceed the capabilities of existing systems. But it also recognizes the serious risks these models pose.

OpenAI's stated goal is to develop AGI that is, at a minimum, a human-like intelligent machine capable of rapidly acquiring new knowledge and generalizing across domains.

To address these risks, OpenAI aims to answer questions such as

How dangerous is the misuse of frontier AI systems today and in the future?
How can we develop a robust framework for monitoring, evaluating, predicting, and protecting against the dangerous capabilities of frontier AI systems?
If frontier AI model weights were stolen, how could malicious actors use them?

The answers to these questions could help ensure the safety of advanced AI systems. The new team comes after OpenAI, along with other leading labs, made voluntary commitments to advance safety and trust in AI through the industry organization Frontier Model Forum.

The announcement of the new safety team precedes the first AI Safety Conference, to be held in the UK in early November.

OpenAI's new Preparedness Team

The Preparedness Team, led by Aleksander Madry, will focus on performance assessment, evaluation, and internal red teaming of Frontier models.

Its mission spans multiple risk categories, including individual persuasion, cybersecurity, chemical, biological, radiological, and nuclear (CBRN) threats, and autonomous replication and adaptation (ARA).

The team will also work to develop and maintain a Risk-Informed Development Policy (RDP) to establish a governance structure for accountability and oversight throughout the development process.

Recommendation

AI research

The future of AI language models may lie in predicting beyond the next word, study suggests

The RDP is intended to complement and extend OpenAI's existing work on risk mitigation and contribute to the safety and alignment of new high-performance systems before and after implementation.

In addition to building the team, OpenAI is launching an AI Preparedness Challenge to prevent catastrophic misuse. The challenge will award $25,000 worth of API credits to up to ten top submissions, and OpenAI will publish innovative ideas and contributions. The Lab will also seek candidates for the Preparedness Team from the top Challenge applicants.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

OpenAI's newest safety team prepares for catastrophic AI risks

OpenAI's new Preparedness Team

The future of AI language models may lie in predicting beyond the next word, study suggests

OpenAI slashes fine-tuning costs for GPT-4o mini with special limited-time offer

OpenAI's SearchGPT demo fail shows how hard it is to catch AI bullshit

OpenAI launches Google Search competitor SearchGPT

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

AI models might need to scale down to scale up again

OpenAI's newest safety team prepares for catastrophic AI risks

OpenAI's new Preparedness Team

Share

Bank details