Content
summary Summary

A new study by researchers at MIT indicates that large language models (LLMs) could develop their own understanding of the world as their language competence increases, rather than just combining superficial statistics.

Ad

Researchers at the Massachusetts Institute of Technology (MIT) have found evidence that large language models (LLMs) may develop their own understanding of the world as their language abilities improve, rather than merely combining superficial statistics. The study contributes to the debate on whether LLMs are just "stochastic parrots" or can learn meaningful internal representations.

For their investigation, the researchers trained a language model with synthetic programs to navigate 2D grid world environments. Although only input-output examples, not intermediate states, were observed during training, a probing classifier could extract increasingly accurate representations of these hidden states from the LM's hidden states. This suggests an emergent ability of the LM to interpret programs in a formal sense.

Figure: Jin, Rinard et al.

The researchers also developed "semantic probing interventions" to distinguish what is represented by the LM and what is learned by the probing classifier. By intervening on semantics while preserving syntax, they showed that the LM states are more attuned to the original semantics rather than just encoding syntactic information.

Ad
Ad

OthelloGPT also showed meaningful internal representations

These findings are consistent with a separate experiment where a GPT model was trained on Othello moves. Here, the researchers found evidence of an internal "world model" of the game within the model's representations. Altering this internal model affected the model's predictions, suggesting it used this learned representation for decision-making.

Although these experiments were conducted in simplified domains, they offer a promising direction for understanding the capabilities and limitations of LLMs in capturing meaning. Martin Rinard, a senior author of the MIT study, notes, "This research directly targets a central question in modern artificial intelligence: are the surprising capabilities of large language models due simply to statistical correlations at scale, or do large language models develop a meaningful understanding of the reality that they are asked to work with? This research indicates that the LLM develops an internal model of the simulated reality, even though it was never trained to develop this model."

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at MIT have found evidence that large language models (LLMs) may develop their own understanding of the world as their language abilities improve, rather than merely combining superficial statistics.
  • The researchers trained a language model with synthetic programs to navigate 2D grid world environments and found that a probing classifier could extract increasingly accurate representations of hidden states from the LM's hidden states, suggesting an emergent ability of the LM to interpret programs.
  • The findings are consistent with a separate experiment where a GPT model trained on Othello moves showed evidence of an internal "world model" of the game within the model's representations, offering a promising direction for understanding the capabilities and limitations of LLMs in capturing meaning.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.