Content
summary Summary

AI systems like ChatGPT will always make things up, but they may soon get better at recognizing their own uncertainty.

Ad

OpenAI says language models will always hallucinate, meaning they'll sometimes generate false or misleading statements, also known as "bullshit." This happens because these systems are trained to predict the next most likely word, not to tell the truth. Since they have no concept of what's true or false, they can produce convincing but inaccurate answers just as easily as correct ones. While that might be fine for creative tasks, it's a real problem when users expect reliable information.

OpenAI breaks down different types of hallucinations. Intrinsic hallucinations directly contradict the prompt, like answering "2" when asked "How many Ds are in DEEPSEEK?" Extrinsic hallucinations go against real-world facts or the model's training data, such as inventing fake quotes or biographies. Then there are "arbitrary fact" hallucinations, which show up when the model tries to answer questions about things rarely or never seen in its training, like specific birthdays or dissertation titles. In those cases, the model just guesses.

To reduce hallucinations, OpenAI says it uses several strategies: reinforcement learning with human feedback, external tools like calculators and databases, and retrieval-augmented generation. Fact-checking subsystems add another layer. Over time, OpenAI wants to build a modular "system of systems" that makes models more reliable and predictable.

Ad
Ad

Models should admit uncertainty

OpenAI says hallucinations will always be part of language models, but future versions should at least know when they're unsure—and say so. Instead of guessing, models should use outside tools, ask for help, or just stop responding.

That doesn't mean the model truly understands what's true or false, but it does mean it can signal when it's not confident about its own answers. The goal is to make model behavior feel more human. People don't have all the answers, and sometimes it's better to admit not knowing than to just guess. Of course, people still guess from time to time.

The GPT-5 mini-thinking model is designed to admit uncertainty much more often than the o4-mini model. | Image: OpenAI

OpenAI points to a deeper issue with how large language models are evaluated. Most benchmarks use strict right-or-wrong scoring and don't give credit for answers like "I don't know." According to OpenAI, this setup encourages models to guess rather than admit uncertainty, which in turn leads to more hallucinations.

That means models that honestly say when they're unsure end up with lower scores than those that always guess, even if they're just making things up. OpenAI calls this an "epidemic," where the system punishes uncertainty and rewards hallucination.

To address this, OpenAI suggests changing the way benchmarks work. Instead of rewarding models for always giving an answer, tasks could require models to respond only when they're confident. Wrong answers would be penalized, while saying "I don't know" wouldn't count against the model. OpenAI says these "confidence thresholds" could offer a more objective way to gauge responsible model behavior, rather than just favoring confident responses.

Recommendation

There's already some progress outside of OpenAI's own benchmarks. A Stanford math professor recently spent a year testing an unsolved problem on OpenAI's models. Earlier versions gave incorrect answers, but the latest model finally admitted it couldn't solve the problem. It also chose not to guess on the toughest question from this year's International Mathematical Olympiad. OpenAI says these improvements should make their way into commercial models in the coming months.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI explains that systems like ChatGPT will continue to make errors because they generate text by predicting likely next words, not by truly understanding facts.
  • To reduce mistakes, OpenAI is using reinforcement learning with human feedback, adding external tools like computers and databases, and creating dedicated fact-checking modules. They also plan a modular system for more precise answer control in the future.
  • The models are now being trained to recognize when they cannot provide a dependable answer and to say so clearly. Early results indicate that models are getting better at showing uncertainty and declining tasks when they cannot answer confidently.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.