Content
summary Summary

A new consensus report from Singapore outlines which technical research areas leading AI scientists believe are needed to keep general-purpose AI systems under control. The focus is less on developing new models and more on ensuring oversight.

Ad

More than 100 experts from eleven countries met at the Singapore Conference on AI in April 2025 to agree on shared priorities for the technical safety of AI systems. The result is the newly released Singapore Consensus on Global AI Safety Research Priorities.

The report deals exclusively with general-purpose AI (GPAI) – systems that can handle a wide range of cognitive tasks, including language models, multimodal models, and autonomous AI agents. Political questions are intentionally left out.

The document divides the field of technical AI safety research into three areas: risk assessment, building trustworthy systems, and post-deployment control. The aim is to create a "trusted ecosystem" that encourages innovation without ignoring societal risks.

Ad
Ad

Risk assessment as a starting point

The first focus is on methods for measuring and predicting risks from AI systems. This includes standardized audit techniques, benchmarks for dangerous capabilities, and ways to assess social impacts. Developing a “metrology” for AI risks – precise, repeatable measurement methods for clearly defining risk thresholds – is also named as an open research challenge.

One central issue is what the report calls the “evidence dilemma”: Waiting too long for hard evidence might allow risks to spiral out of control, while acting too soon could lead to unnecessary or ineffective interventions. The report recommends prospective risk analysis, similar to techniques used in nuclear safety and aviation, such as scenario analysis, probabilistic risk assessments, or bow-tie analyses.

More research is needed to identify dangerous capabilities early, such as those related to cyberattacks or biological weapons. The report also highlights “uplift studies,” which examine whether AI systems significantly increase the effectiveness of malicious users.

Systems should guarantee safety, not just promise it

The second focus is on developing trustworthy, robust, and specification-compliant AI systems. This means precisely defining the desired behavior (specification), implementing it technically (design), and finally verifying that the system actually does what it is supposed to do.

The report highlights ongoing challenges in specifying human goals: Even simple formulation errors can lead to reward hacking, deception, or unwanted power-seeking behavior. Modeling behavior in multi-stakeholder scenarios or with conflicting user preferences is also still difficult to do reliably.

Recommendation

At the system level, the report calls for robust training methods to defend against attacks, better techniques for targeted model editing, and the development of agent-free, domain-specific, or capacity-limited AI models to structurally prevent dangerous behaviors.

The long-term goal is to build AI systems with guaranteed safety, for example through verifiable program synthesis or formal world models with provable environmental effects. These approaches are still in their early stages.

Oversight after deployment

The third area is about controlling AI systems after they have been deployed. This includes classic monitoring and intervention tools such as hardware-based surveillance, off-switches, and emergency protocols.

Special attention is given to highly capable systems that might actively try to circumvent control mechanisms. Here, the report recommends research on scalable oversight – such as debates between AI systems or "nested oversight" structures – along with “corrigibility” approaches designed to keep systems correctable and shutdown-capable.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

The report also calls for monitoring the broader AI ecosystem: tracking models, using watermarks, building logging infrastructure, and standardizing authentication for AI agents. These measures could help identify and trace deepfakes or dangerous open-source models more effectively.

Cooperation in a competitive landscape

One key argument in the report is that certain safety measures are in the self-interest of all stakeholders, even when they are direct competitors. The report points to the definition of technical risk thresholds as an example: If a system is shown in tests to aid cyberattacks, this could serve as a trigger for countermeasures. Competing companies could jointly establish such thresholds.

The report's editors include Yoshua Bengio (MILA), Stuart Russell (UC Berkeley), and Max Tegmark (MIT). Contributors come from institutions like Tsinghua, Berkeley, MILA, as well as OpenAI and other AI labs.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • The Singapore Consensus on Global AI Safety Research Priorities, agreed upon by over 100 AI experts from eleven countries, is the first to define common technical research areas for the safety of general-purpose AI.
  • The report identifies precise risk assessment, including prospective analyses and measurement methods; the development of robust, specification-compliant AI systems; and post-delivery control and monitoring, such as control mechanisms, model tracking, and authentication standards, as key tasks.
  • The experts emphasize that technical safety measures, such as clearly defined risk thresholds and common standards, are in the self-interest of all stakeholders. They also call for increased cooperation between companies and research institutions, despite competition.
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.