OpenAI's SearchGPT demo fail shows how hard it is to catch AI bullshit

& Matthias Bastian Jul 26, 2024

OpenAI

Key Points

OpenAI's prototype AI search engine, SearchGPT, made a significant error in its pre-recorded demo video, misstating the dates of the An Appalachian Summer Festival in Boone, North Carolina.
The incident highlights a major criticism of large language models, especially for search: they can generate convincingly false information by reformatting existing data and placing it in new contexts without understanding the meaning.
OpenAI acknowledges the error and emphasizes that SearchGPT is an initial prototype that will be improved over time, with successful features eventually integrated into ChatGPT.

Yesterday, OpenAI unveiled its prototype AI search engine, SearchGPT. However, the pre-recorded demo video meant to showcase the tool's capabilities contained a significant error, putting SearchGPT in familiar company.

In the demo, a user searches for "music festivals in Boone, North Carolina in August." SearchGPT lists several festivals, with An Appalachian Summer Festival at the top, supposedly running from July 29 to August 16.

However, the festival has confirmed that it actually runs from June 29 to July 27. The dates provided by SearchGPT are when the festival box office is closed.

These errors occur because the AI model doesn't understand the meaning of the sentences it generates. In the context of search, it is simply reformatting existing information and putting it in a new context.

The fact that such an error has now made it into an official product presentation twice (see below) shows how subtle and difficult they are to detect. This is a major criticism of large language models - they can be very convincingly very wrong.

Video: OpenAI

OpenAI spokeswoman Kayla Wood acknowledged the error to The Atlantic, stating, "This is an initial prototype, and we’ll keep improving it."

The company's cautious approach to SearchGPT's rollout suggests they anticipate such issues. The prototype is available to a limited number of users, and it's not meant to be permanent. Features that prove successful over time will be integrated into ChatGPT. OpenAI doesn't expect SearchGPT to dominate the search market anytime soon; it's an early test.

"We will learn from the prototype, make it better, and then integrate the tech into ChatGPT to make it real-time and maximally helpful," writes OpenAI CEO Sam Altman.

This incident echoes Google's chatbot Bard, which incorrectly claimed in its first demo that the James Webb Space Telescope took the first image of an exoplanet.

The stock market reacted swiftly then, with Alphabet's market value dropping by about $100 billion, or 9%. OpenAI's SearchGPT announcement cost Google a few percentage points in market value, despite OpenAI's mistake. This shows how different expectations are for the two companies.

OpenAI launches SearchGPT, Google scales back AI overviews

SearchGPT may look like an answer to Google's Search Generative Experience, which is available to many users as so-called AI Overviews. But it's actually the other way around: Google's AI Overviews were a response to a perceived threat from OpenAI. Google wanted to get ahead of the AI startup.

And it did, but at a cost. Within days, examples of sometimes life-threatening health advice and other nonsensical or false statements appeared that, in the context of Google's search engine, gave the impression of being based on reputable sources.

Google publicly acknowledged the errors and promised improvements. However, the core issue persists, and according to an analysis, Google has significantly reduced the display of AI Overviews. Initially, 84% of queries got AI summaries; now it's less than 15%. Meanwhile, for some reason, Microsoft has just introduced a copycat version of Google's SGE with Bing.

Now, even if OpenAI manages to drastically reduce hallucinations in SearchGPT, a fundamental problem remains: the search business model requires scale. More users mean more errors.

Even a 1% hallucination rate would result in tens of millions of incorrect answers daily at Google's scale. So far, there's no solution to reliably eliminate AI-generated bullshit—a problem, or rather a feature, as old as the technology itself.

What's more, LLM searches are much more computationally intensive and expensive than traditional searches, and many questions about the economics of the web in the chatbot era remain unanswered.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: The Atlantic | Reuters