OpenAI unveils GPT-5.5, claims a "new class of intelligence" at double the API price
Key Points
- OpenAI has released GPT-5.5, a new agent-based model that can autonomously handle complex tasks like writing code, running online searches, and analyzing data across multiple tools.
- The model beats out competitors including Anthropic's Claude Opus 4.7 and Google's Gemini 3.1 Pro on key benchmarks, particularly in programming and advanced math, without sacrificing speed, though it doesn't come out on top across the board.
- A more capable GPT-5.5 Pro variant has also launched as an iterative research partner, with both models now available to paying ChatGPT and Codex users on the Plus, Pro, Business, and Enterprise plans, while API access is coming soon at twice the cost.
Update –
- Added API availability
Update April 25, 2026:
GPT-5.5 and GPT-5.5 Pro are now both available through OpenAI's Responses and Chat Completions API, each with a one million token context window. "Agents built with GPT-5.5 can plan, gather context, call tools, recover from ambiguity, and complete longer workflows with less guidance," OpenAI writes. GPT-5.5 Pro is designed for "higher-accuracy work," the company says.
Independent testing lab Artificial Analysis has already benchmarked GPT-5.5: OpenAI's new model takes the overall top spot by a slim margin over Anthropic's Claude and Google's Gemini, but shows a notable weakness with hallucinations. Effective API costs run about 20 percent higher than GPT-5.4, according to the lab—the doubled token prices on paper are partially offset by lower token usage per task.
Original article from April 23, 2026:
OpenAI has announced GPT-5.5, an agentic model designed to handle complex tasks autonomously across multiple tools. On paper, it's double the API price.
OpenAI has unveiled GPT-5.5, calling it a "new class of intelligence for real work and powering agents." The model is built to understand complex goals, use tools, check its own output, and work through tasks independently until they're done, OpenAI says. It's available now for paying ChatGPT and Codex users.
Agentic workflows are the main selling point
According to OpenAI, GPT-5.5 is especially strong at writing and debugging code, web research, data analysis, creating documents and spreadsheets, and operating software. The model is designed to switch between different tools on its own until a task is finished.
OpenAI sees the biggest improvements in four areas: agentic coding, computer use, knowledge work, and early scientific research. These areas require reasoning across contexts and the ability to carry out actions over extended periods, the company says.
On Terminal-Bench 2.0, a coding benchmark for agentic workflows, GPT-5.5 scores 82.7 percent according to OpenAI—7.6 percentage points above its predecessor GPT-5.4 (75.1 percent). Anthropic's Claude Opus 4.7 hits 69.4 percent, and Google's Gemini 3.1 Pro lands at 68.5 percent.
The gap gets even wider on harder math problems. On FrontierMath Tier 4, GPT-5.5 scores 35.4 percent, compared to 22.9 percent for Claude Opus 4.7 and 16.7 percent for Gemini 3.1 Pro. The Pro variant, GPT-5.5 Pro, pushes that number to 39.6 percent.
OpenAI says GPT-5.5 delivers these performance gains without sacrificing speed. The model reportedly matches GPT-5.4's per-token latency while also using significantly fewer tokens to complete the same Codex tasks.
| GPT-5.5 | GPT-5.4 | GPT-5.5 Pro | GPT-5.4 Pro | Claude Opus 4.7 | Gemini 3.1 Pro | |
|---|---|---|---|---|---|---|
| Terminal-Bench 2.0 | 82.7% | 75.1% | - | - | 69.4% | 68.5% |
| Expert-SWE (Internal) | 73.1% | 68.5% | - | - | - | - |
| GDPval (wins or ties) | 84.9% | 83.0% | 82.3% | 82.0% | 80.3% | 67.3% |
| OSWorld-Verified | 78.7% | 75.0% | - | - | 78.0% | - |
| Toolathlon | 55.6% | 54.6% | - | - | - | 48.8% |
| BrowseComp | 84.4% | 82.7% | 90.1% | 89.3% | 79.3% | 85.9% |
| FrontierMath Tier 1-3 | 51.7% | 47.6% | 52.4% | 50.0% | 43.8% | 36.9% |
| FrontierMath Tier 4 | 35.4% | 27.1% | 39.6% | 38.0% | 22.9% | 16.7% |
| CyberGym | 81.8% | 79.0% | - | - | 73.1% | - |
OpenAI's benchmark comparison for GPT-5.5. GPT-5.5 Pro was only tested on selected benchmarks. | Table: OpenAI
Long-context performance also improved significantly. On the MRCR v2 benchmark, which tests how reliably a model can locate multiple pieces of hidden information across very long texts, GPT-5.5 jumps to 74.0 percent at context lengths of 512K to 1M tokens, up from 36.6 percent for GPT-5.4. On the Graphwalks BFS test with one million tokens, GPT-5.5 leaps from 9.4 percent (GPT-5.4) to 45.4 percent.
The dominance isn't total, though. On SWE-Bench Pro, which tests real GitHub issue resolution, Claude Opus 4.7 beats GPT-5.5 with 64.3 percent versus 58.6 percent. OpenAI notes, however, that Anthropic itself acknowledged signs of memorization in some of those tasks.
On MCP Atlas, a tool-use benchmark run by Scale AI, GPT-5.5 scores 75.3 percent, trailing both Claude Opus 4.7 (79.1 percent) and Gemini 3.1 Pro (78.2 percent). The base model also falls slightly behind Gemini on BrowseComp, a web research benchmark, with 84.4 percent versus 85.9 percent.
And GPT-5.5 barely moved the needle on GDPval, a benchmark designed to measure real-world task performance across 44 occupations. GPT-5.5 scores 84.9 percent, only a marginal improvement over GPT-5.4's 83.0 percent. A full overview of all benchmarks is available here.

The model was developed and optimized alongside NVIDIA GB200 and GB300-NVL72 systems. OpenAI says GPT-5.5 and Codex actually helped optimize the company's own serving infrastructure—Codex analyzed production traffic patterns and wrote its own heuristic algorithms for load balancing, resulting in an over 20 percent boost in token generation speed. "The model helped improve the infrastructure that serves it," OpenAI writes.
GPT-5.5 Pro aims to be a "research partner"
Alongside the standard model, OpenAI is launching GPT-5.5 Pro. The company says full-stack inference improvements make the more powerful model much more practical for heavy workloads. Early testers called it an iterative "research partner" that performs best when given rich context from documents and plugins.
So far, OpenAI has only shared GPT-5.5 Pro benchmark results for three of nine tests: BrowseComp, FrontierMath Tier 1-3, and FrontierMath Tier 4. It beats the base model in all three.
Cybersecurity capabilities rated "High"
OpenAI classifies the biological, chemical, and cybersecurity capabilities of GPT-5.5 as "High" in its Preparedness Framework, the same rating as its recent predecessors, but not "Critical." The model shows improved cybersecurity performance compared to GPT-5.4, scoring 81.8 percent on the CyberGym benchmark (versus 79.0 percent) and 88.1 percent on internal capture-the-flag tasks (versus 83.7 percent).
At the same time, OpenAI is rolling out stricter classifiers for potential cyber risk, which could initially lead to more rejections, the company says. The Trusted Access for Cyber program will give verified security researchers expanded access to cybersecurity capabilities. OpenAI is also working with government partners to protect critical infrastructure. A system card with additional security details is available here.
Paying users get access first; API pricing doubles over GPT-5.4
GPT-5.5 Thinking is now available for Plus, Pro, Business, and Enterprise users in ChatGPT. GPT-5.5 Pro is limited to Pro, Business, and Enterprise users. In Codex, GPT-5.5 is available for Plus, Pro, Business, Enterprise, Edu, and Go users with a 400K context window. A fast mode generates tokens 1.5 times faster at 2.5 times the cost.
For the API, OpenAI is charging 5 dollars per million input tokens and 30 dollars per million output tokens, with a context window of one million tokens, exactly twice what GPT-5.4 costs at 2.50 and 15 dollars, respectively. GPT-5.5 Pro lands at 30 dollars per million input tokens and 180 dollars per million output tokens.
OpenAI argues that despite the higher price tag, GPT-5.5 is more efficient and needs fewer tokens for comparable tasks. There's no word yet on when free users will get access. As for the API, OpenAI says that it's coming "very soon."
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe now