OpenAI has launched "Deep Research," a new Agent feature for ChatGPT that aims to streamline complex research tasks. Initially available to Pro users, the tool promises to compress hours of online research into just a few minutes.
The new agent function builds on OpenAI's latest o3-models, using reinforcement learning to tackle challenging research and analysis tasks. The company says the system is trained on a wide range of complex browsing and reasoning challenges, teaching it to sift through and synthesize large amounts of online information efficiently.
According to OpenAI, the agent can independently search the internet and handle complex tasks across domains like finance, science, and technology. It produces detailed reports with citations that match the quality of professional research analysts, OpenAI claims.
Searches typically take between 5 and 30 minutes, with the system performing particularly well when hunting down niche information spread across multiple websites. The results appear as a chat-based report, with plans to add embedded images, data visualizations, and other analytical elements in the coming weeks.
According to OpenAI, the system still produces hallucinations and confidently presents incorrect conclusions, but claims that these occur less frequently than in previous models, without providing specific numbers.
OpenAI co-founder Greg Brockman describes Deep Research as an "extremely simple agent" - an o3 model capable of web browsing and Python code execution. The company's staff frequently uses the tool internally, especially for e-commerce searches, where it performs "much better" than traditional methods, Brockman writes. This positions the tool as a potential competitor to Google's own Deep Research feature and, of course, Google Search.
Significant benchmark improvements
On Humanity's Last Exam, which tests AI expertise in various subjects at the expert level, Deep Research scored 26.6% accuracy - significantly higher than previous models such as GPT-4o at 3.3% and o3-mini-high at around 13%. Compared to the o1 model, the system showed the most significant gains in chemistry, humanities, social sciences and mathematics, OpenAI says.
On the GAIA benchmark, which evaluates AI systems on 466 real-world tasks including reasoning and multimodal processing, Deep Research scored 72.57%, beating the previous record of 63.64%.
The system's success rate shows a stronger correlation with the economic value of a task than with the time it would take a human to complete it. For tasks with low economic value, Deep Research achieves a success rate of almost 20%, while for high-value tasks the success rate drops to around 9%.
In terms of time, the system completes shorter tasks (1-3 hours) with over 20% success, while longer tasks (4+ hours) are consistently around 13-14%.
According to OpenAI, this pattern suggests that AI systems face different challenges than humans. "The things that models find difficult are different to what humans find time-consuming," OpenAI writes.
OpenAI CEO Sam Altman estimates that Deep Research can handle "a single-digit percentage of all economically valuable tasks in the world" - a milestone he describes as "wild."
Deep research is much cheaper than a human, but still expensive for OpenAI
The feature currently requires a ChatGPT Pro subscription at $200 per month, which includes up to 100 deep research requests through the web version. OpenAI points to high computing costs to explain these limitations, but says it's working on a faster, more cost-effective version that will use a smaller model.
The company plans to expand access to Plus and Team users in about a month, along with support for mobile and desktop applications. The current version will have about 10 research tasks per month in the Plus tier and "a very small number" in the free tier, according to Altman. Users in the EU, UK, and Switzerland won't have access to the service for now.