Researchers at the University of Copenhagen are taking another look at the "Othello world model" hypothesis, asking whether large language models can pick up the rules and board structure of Othello just by analyzing sequences of moves.
The Othello world model hypothesis suggests that language models trained only on move sequences can form an internal model of the game - including the board layout and game mechanics - without ever seeing the rules or a visual representation. In theory, these models should be able to predict valid next moves based solely on this internal map.
The idea that generative AI can build world models has become more prominent, especially after OpenAI's Sora and the "dead end" critique from Meta's Yann LeCun. But these questions go back further, including early experiments testing whether GPT-2 could learn an internal model of Othello. Those first studies had their limitations, especially in how they analyzed what the models were really doing, but they suggested that transformer networks can pick up structure and rules from simple data.
This challenged the widespread belief that large language models are just "stochastic parrots" blindly mimicking patterns. While those early results didn't fully generalize to today's much larger models or address every critique, they raised deeper questions about what LLMs might be capable of.
If the Othello world model hypothesis holds, it would mean language models can grasp relationships and structures far beyond what their critics typically assume.
Models build internal maps
In their latest study, the Copenhagen team trained seven different language models - GPT-2, T5, Bart, Flan-T5, Mistral, LLaMA-2, and Qwen2.5 - to predict the next move in Othello games. They used two datasets: one with about 140,000 real games and another with millions of synthetic games.
A key difference from earlier work is their use of "representation alignment tools." These let researchers directly compare the internal "maps" each model forms of the Othello board. The team says these tools overcome limitations found in previous studies, such as OthelloGPT.
The results show that the models learn not just how to play Othello, but also develop internal representations of the board's spatial structure that are strikingly similar. Even across different architectures, the way these models "see" the board lines up with what the researchers call a "high similarity."
Model performance depended on both architecture and dataset size. On real game data, most models achieved error rates below six percent when trained on the full dataset. With synthetic data, error rates dropped sharply as the dataset grew - from about 50 percent with 2,000 games to less than 0.1 percent with the complete set.
Interestingly, models like Flan-T5 and LLaMA-2, which had been pretrained on general text, didn't consistently outperform models with no prior language training. This suggests that learning a world model of the Othello board from move sequences doesn't depend on previous language knowledge.
Implications for AI research
The study challenges a key assumption held by some LLM critics: that monomodal systems - trained only on one type of data, like text - can't solve problems that require understanding visual or spatial information. Since the Othello board is fundamentally visual, the fact that these models can reconstruct it from raw move sequences demonstrates a surprising capacity for abstraction.
The findings also address the long-standing symbol grounding problem in AI - the challenge of how abstract symbols (like "C3" in Othello) get linked to real-world meaning. Here, the models learn to associate symbols like "C3" with specific board locations and their spatial relationships, rather than treating them as generic tokens.
Yifei Yuan and Anders Søgaard, the authors of the study published at ICLR 2025, argue that their work provides much stronger evidence for the Othello world model hypothesis than previous research.