OpenAI has launched o3-pro for Pro users in ChatGPT and via the API. The new model is designed to deliver more reliable and thorough answers by leveraging greater computing power, though this comes at the cost of noticeably slower response times, even for simple prompts.
While OpenAI hasn't shared specific technical details, some observers believe o3-pro—like the previous o1-pro—may run multiple passes for each prompt and uses something like a consensus approach to refine its answers.
o3-pro is intended for demanding tasks in areas such as math, science, and programming, where reliability is prioritized over speed. OpenAI recommends the model for situations where waiting longer for a response is an acceptable trade-off for greater accuracy.
Unlike o1-pro, o3-pro can access a broader range of tools. The model is able to search the web, analyze files, handle visual inputs, use Python, and personalize responses with memory features. This expanded toolset typically leads to longer wait times compared to earlier models.
o3-pro: Strong on complex problems, weak on small talk
According to OpenAI, expert reviews indicate that o3-pro outperforms o3 across all tested categories, particularly in science, education, programming, business, and writing support. The model consistently ranks higher for clarity, completeness, following instructions, and accuracy. OpenAI uses a reliability test that requires the model to answer a question correctly in four separate attempts to count as a success. Academic evaluations suggest o3-pro surpasses both o1-pro and o3 on these measures.
Few independent benchmarks for o3-pro exist so far. Ben Hylak from Raindrop.ai, who had early access, reported that the model's strengths become apparent mainly with complex tasks that involve extensive context. Simple questions or tests did not stand out, but when given detailed background information—such as company plans and meeting notes—o3-pro was able to generate a comprehensive plan with specific metrics and timelines. Hylak notes that these kinds of qualitative improvements are difficult to measure using standard evaluation methods.
For everyday conversation, the model seems less suitable. Yuchen Jin, CTO at Hyperbolic Labs, demonstrated on X how o3-pro handled the simple greeting "Hi, I'm Sam Altman": the model took between four and 14 minutes, cost about $80, and finally replied, "Hello Sam Altman. How can I assist you today." A clear case of overthinking.
High costs and current limitations
o3-pro is now available for Pro and Team users, replacing o1-pro. Enterprise and Edu users are scheduled to gain access the following week. For developers, the model is currently offered only through the Responses API, enabling support for advanced features like multi-turn interactions before responding to API requests. o3-pro supports a 200,000-token context window and can generate up to 100,000 output tokens. The knowledge cutoff is June 1, 2024.
Pricing for o3-pro is significantly higher than other available models, with input tokens priced at $20 per million and output tokens at $80 per million. However, this represents an over 80 percent reduction from the previous rates charged for o1-pro. Prices for o3 have also been lowered dramatically, making it 80 percent cheaper than just a few days ago.
Some limitations remain in place. Temporary chats are currently disabled due to a technical issue. Image generation is not supported; users are advised to use GPT-4o, o3, or o4-mini for this feature. Canvas functionality is also unavailable at this time.