Content
summary Summary

A team of researchers at the University of Oxford has developed a method for identifying errors in LLM generations. They measure "semantic entropy" in the responses of large language models to identify potential confabulations.

Ad

In machine learning, entropy describes the natural fluctuations and uncertainties in the data. By estimating entropy, a model can better assess how well it captures the underlying patterns in the data and how much uncertainty remains in its predictions.

The "semantic entropy" now used by the Oxford researchers measures this uncertainty at the level of the meaning of sentences. It is designed to estimate when an LLM query might lead to correct but arbitrary or incorrect answers to the same question.

The researchers call this subset of AI hallucinations - or LLM soft bullshit - "confabulation" and distinguish it from systematic or learned LLM errors. The researchers emphasize that their method only improves on these confabulations.

Ad
Ad

Language models are better at knowing what they don't know than previously assumed

The researchers generate several possible answers to a question and group them based on bidirectional implication. If a sentence A implies that a sentence B is true and vice versa, they are assigned to the same semantic cluster by another language model.

By analyzing multiple possible answers to a question and grouping them, researchers calculate semantic entropy. A high semantic entropy indicates a high level of uncertainty and therefore possible confabulation, while a low value indicates consistent and more likely correct answers.

Semantic entropy measures uncertainty in responses by clustering responses that are similar in meaning (a). Low values indicate the LLM's confidence in the meaning. In longer texts, it detects confabulation by high average entropy for questions about single facts (b). | Image: Farquhar, S., Kossen, J., Kuhn, L. et al.

By filtering out questions that are likely to lead to confabulation, the accuracy of the remaining answers can be increased. According to the researchers, this works across different language models and domains without training on domain-specific knowledge.

In tests across 30 task and model combinations, the method was able to distinguish between correct and incorrect AI answers about 79 percent of the time, outperforming existing methods by about ten percent.

This relative success of semantic entropy in error detection suggests that LLMs are better at "knowing what they don't know" than previously thought - "they just don’t know they know what they don’t know," the researchers write.

Recommendation

They emphasize that their method is not a comprehensive solution for all types of errors in LLMs, focusing specifically on the detection of confabulation. Further research is needed to address systematic errors and other uncertainties.

Higher LLM reliability is more expensive

In practice, model and AI service providers could build semantic entropy into their systems and allow users to see how certain a language model is that a suggested answer is correct. If it is not sure, it might not generate an answer or mark uncertain passages of text.

Image: Farquhar, S., Kossen, J., Kuhn, L. et al.

However, this would mean higher costs. According to co-author Sebastian Farquhar, the entropy check increases the cost per query by a factor of five to ten because up to five additional queries have to be generated and evaluated for each single question.

"For situations where reliability matters, the extra tenth of a penny is worth it," Farquhar writes.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

It remains to be seen whether companies like OpenAI or Google, with hundreds of millions or even billions of chatbot queries per day, will come to a similar conclusion for a ten percent improvement in a particular segment of hallucinations, or whether they'll just move on and not care.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at the University of Oxford have developed a method for measuring "semantic entropy" in the responses of large language models to identify potential confabulations (arbitrary and incorrect responses).
  • The method generates multiple possible responses to a question, groups responses with similar meanings, and calculates the semantic entropy. A high entropy indicates uncertainty and possible confabulation, while a low entropy indicates consistent answers.
  • In tests, the method was able to distinguish between correct and incorrect AI answers 79 percent of the time, about ten percent better than previous methods. Incorporating it into language models could increase reliability, but at a higher cost.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.