Content
summary Summary

OpenAI has unveiled GPT-5, a new AI system that builds on the reasoning advances of the o1 and o3 models and unifies every previous model line into a single adaptive architecture.

Ad

According to the company, this design lets the system adjust its "thinking effort" to the complexity of each task, aiming for more reliable and accurate responses.

Access to GPT-5 depends on your subscription tier. For the first time, free users can try a model designed for logical reasoning, while paying customers get higher usage limits and exclusive features.

Video: OpenAI

Ad
Ad

A unified system with adaptive reasoning

OpenAI says GPT-5 isn't just a single model but an integrated system. It uses a fast, efficient model called gpt-5-main for most queries, while a deeper reasoning model, gpt-5-thinking, handles more complex problems. A real-time router chooses which model to use based on how hard the question is, the conversation context, or even explicit user prompts like "think carefully about this." This router is continuously improved through user feedback.

"Pro" subscribers also get access to GPT-5 Pro, a version that spends even more time reasoning through tough questions. In tests cited by OpenAI, external evaluators preferred GPT-5 Pro over "GPT-5 thinking" in 67.8 percent of challenging cases.

Better performance on benchmarks and real-world tasks

OpenAI claims GPT-5 sets new standards in programming, healthcare, and writing. In coding, the model is supposed to excel at building complex frontends and debugging large codebases. According to the company, GPT-5 achieves 74.9 percent on SWE-bench Verified and 88 percent on Aider Polyglot, reducing error rates by two-thirds compared to o3.

The model also aims to deliver more precise answers to health-related questions, acting as an "active thought partner" that asks follow-up questions. On the tough HealthBench Hard test, GPT-5 scored 46.2 percent, up from 31.6 percent for o3. OpenAI stresses, however, that GPT-5 is not a replacement for a medical professional. Other benchmarks show similar gains, including 94.6 percent on AIME 2025 (math, no tools) and 84.2 percent on MMMU (multimodal understanding). GPT-5 Pro reportedly hits 88.4 percent on the GPQA benchmark for very difficult science questions.

Fewer hallucinations, more transparency

Reducing hallucinations is one of GPT-5's key promises. With web search enabled, OpenAI says the model is about 45 percent less likely to make factual errors than GPT-4o. In pure "thinking" mode, the error rate drops by 80 percent compared to o3. On open, fact-based benchmarks like LongFact and FActScore, GPT-5 produces about six times fewer hallucinations than o3.

Recommendation

Even without web search, improvements are clear. On LongFact-Concepts, LongFact-Objects, and FActScore, GPT-5 (thinking) averages between 0.8 and 1.4 percent hallucination rates, down from 24 to 38 percent for o3. That means GPT-5 makes more than five times fewer factual mistakes than o3, even without up-to-date web data.

The model is also designed to be more honest about its own limits. In one test, models were asked questions about nonexistent images on the CharXiv benchmark. OpenAI says o3 responded with confident, made-up answers 86.7 percent of the time, while GPT-5 did so just 9 percent of the time. Overall, the deception rate in representative conversations dropped from 4.8 percent with o3 to 2.1 percent with GPT-5.

"Safe Completions": A new approach to AI safety

GPT-5 introduces a new safety system called "Safe Completions," detailed in an accompanying research paper. This replaces the old "hard refusal" method, which OpenAI says was too rigid—especially for ambiguous or dual-use topics, where information could be used for good or harm.

Instead of blocking requests outright, GPT-5 focuses on making the output safe, not just judging the user's intent. The model tries to give the most helpful answer possible within safety guidelines, which could mean a high-level, partial, or alternative response. According to OpenAI, human evaluators found this approach safer, more helpful, and more balanced overall. In line with this, GPT-5-thinking is rated "high capability" for biology and chemistry under OpenAI's Preparedness Framework, following more than 5,000 hours of red teaming by partners such as CAISI (US) and UK AISI.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

New tools and more developer control

GPT-5 brings several new features to the API. Developers can now fine-tune the model's reasoning effort and verbosity. "Custom Tools" can be called with plain text instead of strict JSON, which should reduce errors for complex inputs. The context window has been expanded to 272,000 input tokens and 128,000 output tokens.

The API offers three model sizes: gpt-5, gpt-5-mini, and gpt-5-nano. OpenAI says gpt-5 is the most powerful "thinking" variant, with prices starting at $1.25 per million input tokens and $10 per million output tokens.

Personalization and tiered access

ChatGPT itself is getting some updates. The new model is designed to be much less "sycophantic"—in tests, this behavior dropped from 14.5 percent to under 6 percent, according to OpenAI. Users will also be able to customize the look of their chats and, as a research preview, choose from four preset personalities like "Cynic" or "Nerd."

Access is tiered: free users get GPT-5 and GPT-5-mini (with limits), "Plus" users receive higher limits, and "Pro" users get unlimited access to GPT-5 and exclusive access to GPT-5 Pro. For Team, Enterprise, and Edu customers, GPT-5 becomes the new default model. OpenAI says the rollout starts immediately.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI has released GPT-5, a new AI system that unifies previous model lines into an adaptive architecture, allowing it to adjust its reasoning based on the complexity of each task and deliver more reliable answers.
  • The model shows improved performance in programming, healthcare, and writing, with significant gains on benchmarks like SWE-bench Verified, HealthBench Hard, and AIME 2025, while also producing far fewer factual errors and hallucinations than earlier versions.
  • GPT-5 introduces "Safe Completions," a new safety approach that aims to provide helpful answers within safety guidelines rather than outright refusing requests, and offers more personalization, developer control, and tiered access for users, starting with immediate rollout.
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.