Can language models like GPT-4 learn meaning, or are they stochastic parrots? A new research paper shows that the models learn more than some critics give them credit for.
In a new study, researchers at CSAIL at the Massachusetts Institute of Technology (MIT) show that language models can learn meaning even if they have only been trained to predict the next token in a text - or in this case, in a program. This contradicts the view that large language models are "stochastic parrots" that learn only superficial statistics about syntax, not meaning or semantics.
With this work, the team aims to disprove two hypotheses:
LMs trained only to perform next token prediction on text are
- (H1) fundamentally limited to repeating the surface-level statistical correlations in their training corpora; and
- (H2) unable to assign meaning to the text that they consume and generate.
Stochastic code parrot or semantics in neural networks?
To clearly define the notion of meaning, the team used program synthesis, because "the meaning (and correctness) of a program is given exactly by the semantics of the programming language." Specifically, they trained a language model using the Karel programming language, which was developed more for educational purposes, to navigate a digital "robot" through a grid world with obstacles. As with Transformer models, the AI system simply learns to predict the next token in the correct program code that will successfully guide the namesake robot, Karel, through the grid worlds.
After training, the team used a linear probe to map the internal states of the language model as a Keras program ran. The team found that it could extract abstractions of the current and future states of the program from the model, even though it had only been trained to predict tokens and not to learn the semantics of Keras. These semantic representations evolved in parallel with the ability of the language model to generate correct programs.
To ensure that this semantic content was not rooted in the linear probe, the team selectively changed internal states in the language model. The team was able to show that there was a strong, statistically significant correlation between the accuracy of the probe and the model's ability to generate a program.
The team sees both hypotheses disproved - and is not alone
The language model also learns to write programs that are, on average, shorter than those in the training set. This suggests, they say, that the language model's output can deviate from the distribution of the training set in a semantically meaningful way.
In addition, the model's perplexity (a measure of uncertainty in predicting the next token) remained high for programs in the training set - even as the model's ability to synthesize correct programs improved. According to the researchers, this rejects hypothesis (H1).
Overall, the results "indicate that the LM representations are, in fact, aligned with the original semantics (rather than just encoding some lexical and syntactic content)." This rejects hypothesis (H2), the team writes.
"The question of whether semantics can be learned from text has garnered considerable interest in recent years. This paper presents empirical support for the position that meaning is learnable from form" the paper states. The method could also be used in the future to further investigate meaning in language models - a question of both practical and philosophical importance, they write.
This work is part of a series of studies that seek to understand the processes that take place in language models. For example, OthelloGPT has shown that a language model can learn an internal model of a game board and use it to predict future moves. While it is not clear to what extent these results can be applied to large language models such as OpenAI's GPT-4, they seem to undermine the stochastic parrot narrative.