Content
summary Summary

OpenAI has extended its Deep Research feature to all ChatGPT Plus, Team, Education and Enterprise users, bringing some improvements since its initial release.

Ad

According to the company, the feature now embeds images with source information in its output and is better at "understanding and referencing" uploaded files.

Plus, Team, Enterprise, and Education users will receive 10 Deep Research queries per month, while Pro users will have access to 120 queries.

The feature, which was first released to Pro users in early February, searches numerous online sources and generates detailed reports based on them, though it still has errors typical for language models.

Ad
Ad

Deep Research hallucinates less than GPT-4o and o1

OpenAI has released a system card outlining the development, capabilities, and risk assessment of Deep Research, including a benchmark for hallucination risk - instances where the model generates false information, also known as bullshit.

Testing with the PersonQA dataset shows significant improvements in accuracy. Deep Research achieved an accuracy of 0.86, significantly better than GPT-4o (0.50), o1 (0.55) and o3-mini (0.22).

The hallucination rate has dropped to 0.13, which is better than GPT-4o (0.30), o1 (0.20) and o3-mini (0.15). OpenAI notes that this rate may overestimate actual hallucinations, as some allegedly incorrect answers are based on outdated test data. The company says extensive online research helps reduce errors, while "post-training procedures" reward factual accuracy and discourage false claims.

Table: Comparison of accuracy and hallucination rate for four AI models, Deep Research leads in both metrics.
Deep Research achieves the highest accuracy of 0.86, while GPT-4.0 and o1 are in the mid-range and o3-mini is significantly weaker. | Image: OpenAI

However, the 13 percent error rate still means that users may encounter multiple inaccuracies in longer research reports. This remains an important consideration when using the tool: Deep Research is most effective for general, well-documented topics with verified sources, or when used by subject matter experts who can quickly validate the generated content.

It's worth noting that AI errors inserted in an otherwise seemingly correct and legitimate surrounding, like a few errors scattered throughout an otherwise well-structured, extensive and very detailed report can be difficult to identify, as OpenAI knows all too well.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI is rolling out its Deep Research feature to all ChatGPT Plus, Team, Education and Enterprise users. Usage is initially subject to a quota: Plus, Team, Enterprise and Education users will receive 10, Pro users 120 Deep Research requests per month.
  • Evaluations using the PersonQA dataset show that Deep Research achieves an accuracy of 0.86 and a hallucination rate of 0.13, outperforming comparable models such as GPT-4o, largely due to its extensive online search.
  • Despite these advances, it's still a matter of perspective whether a 13 percent error rate is acceptable for research texts that may span several pages.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.