Content
summary Summary

Apple's recent research paper "The Illusion of Thinking" has reignited debate over whether large language models can really reason.

Ad

Apple's team put leading models to the test with classic logic puzzles like the Tower of Hanoi, finding that even advanced systems still struggle to carry out simple algorithms correctly and completely. Based on these results, the authors argue that LLMs lack true generalizable reasoning, instead acting as pattern matchers that overlook deeper structures.

Other research seems to back this up. A separate study reached similar conclusions, though it was less critical, noting there's still much to learn about how well LLMs can reason. And a Salesforce paper benchmarking LLM performance in CRM contexts found that their abilities took a nosedive in more complex, multi-turn scenarios.

Critics say the argument is overly black-and-white

LLM skeptics see these papers as confirmation of their doubts that these systems are capable of real reasoning, and worry this could limit the progress of advanced AI. But some AI experts argue that the paper's take is too simplistic.

Ad
Ad

Lawrence Chan from Metr offered a nuanced perspective on LessWrong. He argues that framing the debate as either real thinking or rote memorization ignores the complex middle ground where both human and machine reasoning actually operate.

For instance, people catch a thrown ball not by solving physics equations, but by relying on learned heuristics. These shortcuts aren't signs of ignorance, but practical strategies for solving problems with limited resources.

Language models, Chan notes, also depend on experience and abstraction under tight computational limits. He points out that generalization can be seen as an advanced form of memorization - starting from individual examples, moving through surface strategies, and eventually forming broader rules.

Chan points out that while LLMs may struggle to output all 32,000+ moves for the 15-disk Hanoi puzzle in the exact requested format, they can generate a Python script to solve the problem instantly. He argues that when LLMs explain their approach, suggest shortcuts, and offer practical solutions in code, it demonstrates a functional–if different–understanding of the task. For Chan, dismissing this as a lack of understanding misses the point.

The authors call it "counterintuitive" that language models use fewer tokens at high complexity, suggesting a "fundamental limitation." But this simply reflects models recognizing their limitations and seeking alternatives to manually executing thousands of possibly error-prone steps  – if anything, evidence of good judgment on the part of the models!

Lawrence Chan

Chan also warns against using performance on theoretical puzzles as a basis for judging models' general abilities. The real question, he says, is whether their strategies can be applied to complex, real-world tasks.

Recommendation

While the Apple paper highlights specific weaknesses in today's LLMs, Chan believes it sidesteps the bigger issue: which kinds of "reasoning" matter for practical use cases, and how well do LLMs handle those?

Sure, an LLM might not be able to do "generalized reasoning" in the sense that the authors propose, but an LLM with a simple code interpreter definitely can. Here, the key question is why we must consider the LLM by itself, as opposed to an AI agent composed of an LLM and an agent scaffold – note that even chatbot-style apps such as ChatGPT provide the LLM with various tools such as a code interpreter and internet access. Why should we limit our discussion of AGIs to just the LLM component of an AI system, as opposed to the AI system as a whole?

Lawrence Chan

AI response paper was just a joke

The widely shared paper "The Illusion of the Illusion of Thinking," which circulated as a supposed response to Apple's critique and was partly written by Claude 4 Opus, was never intended as a genuine rebuttal. According to author Alex Lawsen, it was simply a joke filled with errors.

Lawsen was surprised by how quickly the joke paper went viral and how many people took it seriously, calling it his "first real taste of something I'd made going properly viral, and honestly? It was kind of scary."

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Apple's research paper "The Illusion of Thinking" questions whether large language models truly understand simple puzzle tasks like the Tower of Hanoi, arguing that these systems act mainly as pattern matchers rather than thinkers.
  • Some AI researchers, including Lawrence Chan, disagree with Apple's position and point out that language models can show strategic and functional problem-solving—even when they use different methods, such as generating code.
  • Alex Lawsen, who had Claude 4 Opus write a widely shared response to Apple's paper, clarified that his paper was intended as a joke. Despite containing many mistakes, it was mistakenly treated as a serious rebuttal on social media.
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.