Content
summary Summary

A new study reveals that large language models often possess accurate information internally, even when their outputs are incorrect. This finding could pave the way for more dependable AI systems.

Ad

Researchers from Technion University, Google, and Apple have demonstrated in a study that large language models have greater awareness of their own errors than previously believed. The study, titled "LLMs Know More Than They Show," offers insights into how AI models process correct and incorrect information internally.

The research team developed a novel method to analyze the inner workings of AI in greater detail. They focused particularly on "exact answer tokens" - the specific parts of an AI response that contain the crucial information. For instance, in answering "What is the capital of France?", the word "Paris" would be the exact answer token in the response "The capital of France is Paris, a world-renowned city."

The study found that these tokens hold the most information about whether a response is accurate or not. Surprisingly, the AI models sometimes "knew" the correct answer internally, but still provided an incorrect output. This suggests that the models possess more information than they reveal in their responses.

Ad
Ad

New approaches to combat hallucinations?

The researchers also examined the AI models' ability to transfer their error detection across various tasks. They discovered this works especially well for similar types of tasks, indicating that AI develops specific abilities to handle certain kinds of information.

According to the research team, these findings could lead to new strategies for enhancing the reliability and accuracy of AI systems. In particular, the fact that models often "know" more internally than they show in their outputs opens up possibilities for improved error detection and correction mechanisms.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A study by researchers from Technion University, Google and Apple shows that large language models often know the right answers internally, even if they provide incorrect outputs.
  • The researchers focused on the "exact answer tokens" in AI responses. They found that these tokens contain most of the information about whether an answer is correct or incorrect. It turned out that the AI models sometimes "knew" the correct answer internally, but still gave an incorrect answer.
  • These findings could lead to new approaches to improve the reliability and accuracy of AI systems. The fact that models regularly "know" more internally than they show in their outputs opens up possibilities for improved error detection and correction mechanisms.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.