AI models know more than they show, study reveals

Midjourney prompted by THE DECODER

A new study reveals that large language models often possess accurate information internally, even when their outputs are incorrect. This finding could pave the way for more dependable AI systems.

Researchers from Technion University, Google, and Apple have demonstrated in a study that large language models have greater awareness of their own errors than previously believed. The study, titled "LLMs Know More Than They Show," offers insights into how AI models process correct and incorrect information internally.

The research team developed a novel method to analyze the inner workings of AI in greater detail. They focused particularly on "exact answer tokens" - the specific parts of an AI response that contain the crucial information. For instance, in answering "What is the capital of France?", the word "Paris" would be the exact answer token in the response "The capital of France is Paris, a world-renowned city."

The study found that these tokens hold the most information about whether a response is accurate or not. Surprisingly, the AI models sometimes "knew" the correct answer internally, but still provided an incorrect output. This suggests that the models possess more information than they reveal in their responses.

New approaches to combat hallucinations?

The researchers also examined the AI models' ability to transfer their error detection across various tasks. They discovered this works especially well for similar types of tasks, indicating that AI develops specific abilities to handle certain kinds of information.

According to the research team, these findings could lead to new strategies for enhancing the reliability and accuracy of AI systems. In particular, the fact that models often "know" more internally than they show in their outputs opens up possibilities for improved error detection and correction mechanisms.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI models know more than they show, study reveals

New approaches to combat hallucinations?

Apple is developing its own AI-powered search engine

MLE-STAR is designed to automate machine learning pipelines with minimal human input

Tim Cook tells Apple employees that AI is as pivotal as the internet or the smartphone

Google upgrades Gemini with Deep Think and flags early warning risks

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

Google DeepMind's Gemini wins Mathematical Olympiad gold using only natural language

AI models know more than they show, study reveals

New approaches to combat hallucinations?

Apple is developing its own AI-powered search engine

MLE-STAR is designed to automate machine learning pipelines with minimal human input

Tim Cook tells Apple employees that AI is as pivotal as the internet or the smartphone