Content
summary Summary

Researchers have conducted a systematic analysis of large language models' capabilities for inductive and deductive reasoning. The study reveals both surprising strengths and clear limitations of these AI systems.

Ad

Scientists from the University of California, Los Angeles and Amazon carried out a new study examining the reasoning abilities of large language models (LLMs) in greater detail. For the first time, the researchers made a systematic distinction between inductive and deductive reasoning.

Inductive reasoning involves deriving general rules from specific observations, while deductive reasoning applies general rules to particular cases. The study aimed to determine which type of reasoning poses a greater challenge for LLMs.

To isolate inductive reasoning, the researchers developed a new method called "SolverLearner." This approach has models learn a function from a few examples that maps inputs to outputs. External programs then apply this function to avoid mixing it with deductive reasoning.

Ad
Ad

The results show that LLMs like GPT-4 achieve near-perfect performance in inductive reasoning using SolverLearner, with 100 percent accuracy in most cases. However, the models struggle more with deductive reasoning, especially on "counterfactual" tasks that deviate from typical training data.

For instance, the models handled arithmetic tasks in the decimal system well, but had difficulties when calculating in other number systems. They also showed weaknesses when analyzing sentences with unusual word order or spatial orientation in modified coordinate systems.

The researchers conclude that deductive reasoning presents a greater challenge for current LLMs. The ability to correctly apply given rules depends heavily on how often similar tasks appear in the training process.

Language models, grokking and architectural adaptations

The study confirms both the strengths and limitations of current AI language models. It demonstrates that these systems have impressive abilities in recognizing patterns and deriving rules. However, they still struggle to correctly apply learned rules to new situations.

For the tests, the team did not use prompting methods like chain-of-thought, which somewhat improve the models' ability to make deductive inferences but cannot raise it to a satisfactory level. The new OpenAI model o1 was not included in the testing.

Recommendation

A separate study by researchers from Ohio State University and Carnegie Mellon University also recently examined the logical reasoning capabilities of Transformer models. They analyzed whether the models can acquire the ability to draw implicit conclusions through "grokking," particularly in composition and comparison tasks.

The results indicate that the models acquire the ability to make implicit inferences in both types of tasks through extended training beyond the point of overfitting. However, they were only able to generalize to unseen examples in comparison tasks. The researchers attribute this difference to the internal structure of the learned circuits and recommend adjustments to the transformer architecture, which also showed a qualitative improvement in an initial experiment.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at the University of California, Los Angeles and Amazon have investigated the reasoning abilities of large language models (LLMs), distinguishing between inductive and deductive reasoning.
  • The results show that LLMs such as GPT-4 typically achieve 100% accuracy in inductive reasoning using the new "SolverLearner" method, but have greater difficulty in deductive reasoning, especially in "counterfactual" tasks.
  • Another study by researchers at Ohio State University and Carnegie Mellon University examined the ability of Transformer models to make implicit inferences through prolonged training, with the models only able to generalize to unseen examples in comparison tasks.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.