OpenAI's Deep Research aims to compress hours of online research into minutes

Feb 3, 2025

OpenAI

Key Points

OpenAI has introduced "Deep Research", a new agent feature for ChatGPT that enables the system to conduct complex research independently. The system was trained using reinforcement learning on challenging browsing and logic tasks, allowing it to effectively analyze and summarize vast amounts of online information.
In evaluations, Deep Research achieved record-breaking results, attaining 26.6 percent accuracy in the Humanity's Last Exam evaluation, significantly outperforming previous models. It also set a record in the GAIA benchmark with a score of 72.57 percent.
The Deep Research feature is currently available to ChatGPT Pro users, with a limit of 100 requests per month. Plus users will follow next month with 10 research tasks per month. OpenAI attributes this limitation to the high computational requirements and is working on developing a faster and more cost-effective version. The service is temporarily unavailable to users in the EU, the UK, and Switzerland.

OpenAI has launched "Deep Research," a new Agent feature for ChatGPT that aims to streamline complex research tasks. Initially available to Pro users, the tool promises to compress hours of online research into just a few minutes.

The new agent function builds on OpenAI's latest o3-models, using reinforcement learning to tackle challenging research and analysis tasks. The company says the system is trained on a wide range of complex browsing and reasoning challenges, teaching it to sift through and synthesize large amounts of online information efficiently.

According to OpenAI, the agent can independently search the internet and handle complex tasks across domains like finance, science, and technology. It produces detailed reports with citations that match the quality of professional research analysts, OpenAI claims.

Searches typically take between 5 and 30 minutes, with the system performing particularly well when hunting down niche information spread across multiple websites. The results appear as a chat-based report, with plans to add embedded images, data visualizations, and other analytical elements in the coming weeks.

According to OpenAI, the system still produces hallucinations and confidently presents incorrect conclusions, but claims that these occur less frequently than in previous models, without providing specific numbers.

OpenAI co-founder Greg Brockman describes Deep Research as an "extremely simple agent" - an o3 model capable of web browsing and Python code execution. The company's staff frequently uses the tool internally, especially for e-commerce searches, where it performs "much better" than traditional methods, Brockman writes. This positions the tool as a potential competitor to Google's own Deep Research feature and, of course, Google Search.

Significant benchmark improvements

On Humanity's Last Exam, which tests AI expertise in various subjects at the expert level, Deep Research scored 26.6% accuracy - significantly higher than previous models such as GPT-4o at 3.3% and o3-mini-high at around 13%. Compared to the o1 model, the system showed the most significant gains in chemistry, humanities, social sciences and mathematics, OpenAI says.

On the GAIA benchmark, which evaluates AI systems on 466 real-world tasks including reasoning and multimodal processing, Deep Research scored 72.57%, beating the previous record of 63.64%.

Two bar charts compare success rates for expert tasks: by economic value (descending) and time spent (fluctuating). — Deep Research is most successful with simple, quick tasks with low economic value. | Image: OpenAI

The system's success rate shows a stronger correlation with the economic value of a task than with the time it would take a human to complete it. For tasks with low economic value, Deep Research achieves a success rate of almost 20%, while for high-value tasks the success rate drops to around 9%.

In terms of time, the system completes shorter tasks (1-3 hours) with over 20% success, while longer tasks (4+ hours) are consistently around 13-14%.

According to OpenAI, this pattern suggests that AI systems face different challenges than humans. "The things that models find difficult are different to what humans find time-consuming," OpenAI writes.

OpenAI CEO Sam Altman estimates that Deep Research can handle "a single-digit percentage of all economically valuable tasks in the world" - a milestone he describes as "wild."

Deep research is much cheaper than a human, but still expensive for OpenAI

The feature currently requires a ChatGPT Pro subscription at $200 per month, which includes up to 100 deep research requests through the web version. OpenAI points to high computing costs to explain these limitations, but says it's working on a faster, more cost-effective version that will use a smaller model.

The company plans to expand access to Plus and Team users in about a month, along with support for mobile and desktop applications. The current version will have about 10 research tasks per month in the Plus tier and "a very small number" in the free tier, according to Altman. Users in the EU, UK, and Switzerland won't have access to the service for now.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: OpenAI