Security testing under pressure as OpenAI accelerates AI development

Midjourney prompted by THE DECODER

OpenAI has dramatically reduced the safety testing period for its newest language models.

While GPT-4 underwent six months of testing, testers now have just days to evaluate the new "o3" model. People involved in the process report less thorough testing and insufficient resources, according to a Financial Times report.

The shortened timeline comes as the models grow more powerful and potentially dangerous, particularly regarding misuse for biological or security-related purposes. Sources say OpenAI wants to accelerate releases to keep pace with competitors like Meta, Google, and xAI.

Testing compromises raise safety concerns

OpenAI previously committed to conducting specialized tests to check for potential misuse, like developing biological weapons. These procedures require substantial resources: custom datasets, fine-tuning, and external experts. However, the Financial Times reports such testing was only performed on older, less capable models. It remains unclear how newer models like o1 or o3-mini would perform under similar conditions.

In o3-mini's safety report, OpenAI only mentions that GPT-4o could solve a specific biological task after fine-tuning - but provides no results for newer models.

Testing practices face additional scrutiny

Another concern involves testing "checkpoints" - intermediate versions of models that continue to be developed. A former technical employee calls this bad practice, though OpenAI maintains these checkpoints are nearly identical to final models.

OpenAI points to efficiency gains through automated testing procedures. Johannes Heidecke, who leads OpenAI's safety systems, says they've found a good balance between speed and thoroughness. While there's no standardized requirement for processes like fine-tuning, the company says it follows best practices and documents them transparently.

Currently, no mandatory global rules exist for AI safety testing. Companies like OpenAI have only made voluntary commitments to authorities in the US and UK. This will change when European AI regulations take effect later this year, requiring providers to formally evaluate their most powerful models for risks.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Security testing under pressure as OpenAI accelerates AI development

Testing compromises raise safety concerns

Testing practices face additional scrutiny

OpenAI updates risk framework to assess advanced AI capabilities

"Unlimited genius" for all: Altman's naive dream of AI equality

OpenAI's o1-preview model manipulates game files to force a win against Stockfish in chess

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Security testing under pressure as OpenAI accelerates AI development

Testing compromises raise safety concerns

Testing practices face additional scrutiny

OpenAI updates risk framework to assess advanced AI capabilities

"Unlimited genius" for all: Altman's naive dream of AI equality

OpenAI's o1-preview model manipulates game files to force a win against Stockfish in chess