Ad
Skip to content

Security testing under pressure as OpenAI accelerates AI development

Image description
Midjourney prompted by THE DECODER

OpenAI has dramatically reduced the safety testing period for its newest language models.

While GPT-4 underwent six months of testing, testers now have just days to evaluate the new "o3" model. People involved in the process report less thorough testing and insufficient resources, according to a Financial Times report.

The shortened timeline comes as the models grow more powerful and potentially dangerous, particularly regarding misuse for biological or security-related purposes. Sources say OpenAI wants to accelerate releases to keep pace with competitors like Meta, Google, and xAI.

Testing compromises raise safety concerns

OpenAI previously committed to conducting specialized tests to check for potential misuse, like developing biological weapons. These procedures require substantial resources: custom datasets, fine-tuning, and external experts. However, the Financial Times reports such testing was only performed on older, less capable models. It remains unclear how newer models like o1 or o3-mini would perform under similar conditions.

Ad
DEC_D_Incontent-1

In o3-mini's safety report, OpenAI only mentions that GPT-4o could solve a specific biological task after fine-tuning - but provides no results for newer models.

Testing practices face additional scrutiny

Another concern involves testing "checkpoints" - intermediate versions of models that continue to be developed. A former technical employee calls this bad practice, though OpenAI maintains these checkpoints are nearly identical to final models.

OpenAI points to efficiency gains through automated testing procedures. Johannes Heidecke, who leads OpenAI's safety systems, says they've found a good balance between speed and thoroughness. While there's no standardized requirement for processes like fine-tuning, the company says it follows best practices and documents them transparently.

Currently, no mandatory global rules exist for AI safety testing. Companies like OpenAI have only made voluntary commitments to authorities in the US and UK. This will change when European AI regulations take effect later this year, requiring providers to formally evaluate their most powerful models for risks.

Ad
DEC_D_Incontent-2

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

  • Over 20 percent launch discount.
  • Read without distractions – no Google ads.
  • Access to comments and community discussions.
  • Weekly AI newsletter.
  • 6 times a year: “AI Radar” – deep dives on key AI topics.
  • Up to 25 % off on KI Pro online events.
  • Access to our full ten-year archive.
  • Get the latest AI news from The Decoder.
Subscribe to The Decoder