Content
summary Summary

A research team at the University of Oxford has introduced a new method called "Semantic Entropy Probes" (SEPs) to efficiently detect uncertainty and hallucinations in large language models. The method could make the practical application of AI systems safer.

Ad

Oxford University scientists have developed a new technique that can cost-effectively detect hallucinations and uncertainties in large language models such as GPT-4. The Semantic Entropy Probes (SEPs) build on previous work on detecting hallucinations, in which some of the authors were involved.

In a paper published in Nature, the team demonstrated that it is possible to measure the "semantic entropy" from the responses of several large language models to identify arbitrary or false answers. The method generates multiple possible answers to a question and groups similar meanings. High entropy indicates uncertainty and potential errors. In tests, the method was able to distinguish between correct and false AI responses in 79 percent of cases - about 10 percent better than previous methods. Integration into language models could increase reliability but would come at a higher cost for providers.

The new SEPs method solves a central problem of semantic entropy measurement: high computational effort. Instead of generating multiple model responses for each query, the researchers train linear probes on the hidden states of language models when answering questions. These hidden states are internal representations that the model generates during text processing. The linear probes are simple mathematical models that learn to predict semantic entropy from these internal states.

Ad
Ad

In practice, this means that SEPs require only a single model response to estimate the model's uncertainty after training. This significantly reduces the computational effort for uncertainty quantification. The researchers show that SEPs are capable of accurately predicting semantic entropy as well as detecting hallucinations in model responses.

"Semantic entropy probes could be further improved with more training

The researchers examined the performance of SEPs across different model architectures, tasks, and model layers. They show that the hidden states in middle to late model layers best capture semantic entropy. SEPs can even predict semantic uncertainty even before the model begins to generate a response.

While SEPs do not quite achieve the performance of more computationally intensive methods such as direct calculation of semantic entropy, the team says they offer a balanced trade-off between accuracy and computational efficiency. This makes them a promising technique for practical use in scenarios where computational resources are limited. In the future, the team wants to further improve the performance of SEPs, for example with larger training datasets for the SEPs.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at the University of Oxford have developed an efficient method called "Semantic Entropy Probes" (SEPs) to detect uncertainties and errors in large language models. SEPs measure the "semantic entropy" from AI responses, with high entropy indicating potential hallucinations.
  • The new technique solves the problem of high computational cost when measuring semantic entropy. Instead of using multiple model responses per query like an older method, SEPs employ trained linear probes to predict uncertainty from a single response.
  • SEPs work across different model architectures and layers, with middle to late layers capturing semantic entropy most effectively. While not quite reaching the performance of more computationally intensive methods, SEPs offer a good trade-off between accuracy and efficiency for practical use. In the future, performance is expected to be further improved through larger training datasets.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.