Language models can reason better if they write down intermediate steps. A new study shows how such "System 2 Reasoning" can, at least partially, be trained into language models.
In recent years, AI methods like Chain-of-Thought Prompting or Branch-Solve-Merge have demonstrated that large language models achieve better results when they are made to generate their answers in multiple steps.
This two-step process can be seen as an expression of Daniel Kahneman's "System 2 Reasoning", where information is processed slowly and consciously. Its counterpart is "System 1", a fast, unconscious, and automated way of thinking.
Researchers from Meta AI have now developed a method to "distill" the computationally intensive "System 2 Reasoning" of AI models into the parameters of a language model. The results show that the resulting "System 1" model achieves similarly good performance in some cases as the original two-stage process - with significantly lower computational effort.
The process works as follows: First, a "System 2" method is applied to a large amount of example data. Then the answers are filtered, e.g., by keeping only consistent results. Finally, this data is used to train the language model through fine-tuning. Essentially, the team generates synthetic training data via System-2 prompts for fine-tuning LLMs to skip steps and answer directly.
Chain-of-thought remains out of reach
The researchers applied the method to four different "System 2" approaches and five task types. They found that distillation works in many, but not all cases.
For methods such as System 2 Attention to avoid biases or Rephrase and Respond to improve responses, it was shown that the resulting "System 1" models delivered similar results as the "System 2" variants, but with significantly fewer generated tokens.
However, the distillation failed for complex mathematical reasoning using Chain-of-Thought Prompting. The researchers suspect this is because some tasks are simply too complex for "System 1" thinking - especially since the models, even with CoT, repeatedly fail at logical tasks.
Nevertheless, the researchers see their method as a promising approach for developing powerful AI systems, which can then focus on the really challenging problems using other methods like CoT.