Content
summary Summary

Anthropic’s new "AI microscope" offers a limited view into the internal representations of its Claude 3.5 Haiku language model, revealing how it processes information and reasons through complex tasks.

Ad

One key finding, according to Anthropic, is that Claude appears to use a kind of language-independent internal representation—what the researchers call a "universal language of thought." For example, when asked to generate the opposite of the word "small" in multiple languages, the model first activates a shared concept before outputting the translated answer in the target language.

Flowchart: Multilingual processing of antonyms, connects
The overlapping multilingual paths show the conceptual connection between "small" and "big" in English, Chinese, and French. | Image: Anthropic

Anthropic reports that larger models like Claude 3.5 exhibit greater conceptual overlap across languages than smaller models. According to the researchers, this abstract representation may support more consistent multilingual reasoning.

The research also examined Claude’s response to questions requiring multiple steps of reasoning, such as: "What is the capital of the state in which Dallas is located?" According to Anthropic, the model activates representations for "Dallas is in Texas" and then links that to "the capital of Texas is Austin." This sequence indicates that Claude is not simply recalling facts but performing multi-step inference.

Ad
Ad
Tree diagram: Logical derivation of the capital of Texas (Austin) based on a fact about Dallas.
Starting with the fact about Dallas, the connection to Austin as the capital is derived step by step. | Image: Anthropic

Detecting signs of planning

The researchers also discovered that Claude plans several words in advance when generating poetry. Rather than composing line by line, it begins by selecting appropriate rhyming words, then builds each line to lead toward those targets. If the target words are altered, the model produces an entirely different poem—an indication of deliberate planning rather than simple word-by-word prediction.

For mathematical tasks, Claude employs parallel processing paths - one for approximation and another for precise calculation. Yet when prompted to explain its reasoning, Claude describes a process different from the one it actually used—suggesting that it is mimicking human-like explanations rather than accurately reporting its internal logic. The researchers also note that when given a flawed hint, Claude often generates a coherent explanation that is logically incorrect.

Comparing AI and human language processing

Research from Google offers a parallel line of investigation. A recent study published in Nature Human Behavior analyzed similarities between AI language models and human brain activity during conversation. Google's team found that internal representations from OpenAI's Whisper model aligned closely with neural activity patterns recorded from human subjects. In both cases, the systems appeared to predict upcoming words before they were spoken.

Video: Google

Despite these similarities, the researchers emphasize fundamental differences between the two systems. Unlike Transformer models, which can process hundreds or thousands of tokens simultaneously, the human brain processes language sequentially—word by word, over time, and with repeated loops. Google writes, "While the human brain and Transformer-based LLMs share basic computational principles in natural language processing, their underlying neural circuit architectures differ significantly."

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Anthropic has developed a method to visualize the inner workings of its Claude 3.5 Haiku language model, revealing how it activates abstract concepts across languages before generating responses in the target language.
  • The study demonstrates that Claude forms logical connections incrementally when answering complex questions and strategically plans ahead in tasks like poetry generation, such as by first choosing rhyming words and then building the text around them.
  • In a separate study, Google Research found similarities between language models and the human brain, but also highlighted key differences: unlike AI, the human brain processes language sequentially and over longer time scales, suggesting a more complex and nuanced approach to language understanding.
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.