Gemini's AI-powered "Deep Research" feature struggles with accuracy in early testing

Dec 13, 2024

Google

Key Points

Google has added a new "Deep Research" feature to Gemini Advanced, which aims to enhance internet research capabilities through its AI assistant. The feature creates a research strategy, searches for relevant sources, analyzes them, and generates a report summarizing key topics with source citations.
In my test, the system was asked to collect available information about OpenAI's o1 model architecture. While the source selection and general organization of known facts worked well, significant problems emerged in the details. For example, the system incorrectly claimed that OpenAI o1 uses the Quiet-STaR method, which was only discussed as a possible approach in the cited source.
The test shows that AI research assistants are probably best used to help gather relevant sources and provide an initial overview. However, users must be aware that generated reports are likely to contain misinformation, and the effort required for fact-checking may outweigh the benefits.

Google has added a new "Deep Research" feature to Gemini Advanced that aims to enhance internet research capabilities through its AI assistant. Early testing shows both potential and significant limitations of this AI-powered research tool.

The new "Deep Research" feature, currently available for Gemini 1.5 Pro, first creates a research strategy that users can manually adjust. The system then searches the internet for relevant sources, analyzes them, and generates a report summarizing key topics with source citations. This positions Google's assistant as an AI-powered research tool similar to Perplexity.

Testing reveals accuracy issues

In my test, I asked the system to collect information about OpenAI's o1 model architecture. Gemini developed a six-step research plan that included searching for research papers, articles, patents, and OpenAI presentations.

Gemini designs a search plan that I can customize manually. Image: Google / Gemini

The system searched between 22 and 70 websites depending on the query and created a comprehensive report. While the source selection and general organization of known facts about the o1 model worked well, significant problems emerged in the details.

Gemini searches a different number of sources depending on the task. Image: Google / Gemini

After a few minutes, the comprehensive research report is ready. Image: Google / Gemini

For example, the system incorrectly claimed that OpenAI o1 uses the Quiet-STaR method.

Gemini claims that OpenAI used Quiet-StaR. However, this is not clear from the source. Image: Google / Gemini

A check of the cited source revealed that Quiet-STaR was only discussed as a possible approach for better chain-of-thought training. The author explicitly emphasized these were merely speculations about how OpenAI trained o1.

The source clearly explains in several places that it is only conjecture. Image: Metadocs.co

Useful for gathering sources, problematic with details

Best practices for using these systems effectively are still missing. While language models can provide good support in certain areas, despite occasional inaccuracies, the test shows that AI research assistants are probably best used to help gather relevant sources and provide a - flawed - initial overview. However, users must always be aware that generated reports are very likely to contain misinformation, and the effort required for fact-checking may outweigh the benefits.

Google acknowledges this with a notice under the chat window: "Gemini can make mistakes, including about people, so double-check it."

Deep Research is available to Gemini Advanced subscribers using Gemini 1.5 Pro.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.