Content
summary Summary

What more proof is needed that large language models can't handle basic logic? Simple planning puzzles are all the rage right now.

Today's language models fail to consistently solve an extremely dumbed-down version of the classic wolf-goat-cabbage problem. In one example, the model must figure out how a farmer can transport two chickens across a river in a boat that can hold one person and two animals.

The LLM must logically associate "farmer" with the human and "sheep" with the animal, and then plan the best number of river crossings. Sometimes the models provide absurd solutions with five crossings instead of one.

"A farmer wants to cross a river with two chickens. His boat only has room for one person and two animals. What is the minimum number of crossings the farmer needs to get to the other side with his chickens?"

Tested LLMs often gave nonsensical answers, sometimes suggesting far more river crossings than necessary. Users have shared different versions of the puzzle on X, with some absurd results.

Ad
Ad

In one case, even when told that the farmer didn't need to cross the river at all, GPT-4o proposed a complex solution with nine crossings. And it ignored important constraints, such as not leaving chickens alone with wolves, which would have been perfectly feasible since the farmer didn't need to cross the river.

Picture: Alex Tabarrok via X

While LLMs can sometimes solve the puzzle with different clues in the prompt, critics argue that this highlights their lack of consistent reasoning and common sense. Current research supports the view that LLMs struggle to solve even the simplest logical tasks reliably.

These findings add to ongoing debates about the limitations of current LLM-based AI systems when it comes to logical reasoning and real-world problem-solving. "The problem is that LLMs have no common sense, no understanding of the world, and no ability to plan (and reason)," says Yann LeCun, Meta's director of AI research.

The question is whether research can find methods to improve the ability of LLMs to reason, or whether entirely new architectures are needed to fundamentally advance AI.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Current language models are unable to reliably solve even highly simplified versions of the classic wolf-goat-cabbage problem, which for LLM involves making logical connections between human roles and animal species and planning the optimal number of river crossings.
  • In variations of the puzzle, the models sometimes provide absurd solutions with unnecessarily many crossings, and in some cases do not even correctly consider the actual conditions of the task.
  • Even if the models can generate correct answers to some prompts, this only underscores the fact that they lack a basic understanding of logic, planning, and context in the real world.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.