Researchers at Stanford University have developed a method called "Quiet-STaR" that enables AI systems to learn to think between the lines. This could pave the way for more versatile and efficient AI that can better solve complex tasks.
When humans write or speak, we often pause to think. We consider how best to phrase an argument, or what the other person is thinking.
This "thinking" is hidden between the lines of almost all texts - for example, in the intermediate steps of mathematical proofs that are not explicitly mentioned. So far, AI has struggled to capture such unspoken thought processes. But that could change.
Internal reasoning helps LLMs generate better answers
Quiet-STaR (Quiet Self-Taught Reasoner) teaches an LLM to think quietly before it speaks. At each point in a text, the AI generates possible reasons why the text continues one way rather than another.
Through trial and error, it learns which considerations lead to the most likely continuations - it thinks before it "speaks", i.e. it continues to generate the text.
The technology is based on the "Self-Taught Reasoner" (STaR), which teaches AI systems to derive reasons from a few examples and to learn from correct answers. However, while STaR only works for certain question-answer tasks, Quiet-STaR is designed to teach language models to infer implicit reasoning from any text.
This sounds simple, but it poses significant challenges: The AI has to learn how to generate "thoughts" and how to use them effectively. It is also computationally intensive to calculate and evaluate many continuations for each text passage.
The researchers are tackling this problem with sophisticated sampling algorithms and techniques such as "teacher forcing," in which the system is gradually introduced to the correct continuations.
The results are impressive: without special training on specific tasks, the AI's ability to answer comprehension questions in common AI tests improved by more than ten percent in some cases (GSM8K from 5.9 percent to 10.9 percent, CommonsenseQA from 36.3 percent to 47.2 percent).
These improvements increased with the length of the generated explanations. They were particularly helpful for difficult passages of text. And the longer the AI "thought," the better the results.
It is possible that by recognizing the logic between the lines in different textual data, the AI becomes more adaptable and better able to apply its knowledge to new problems. It learns to understand contexts instead of just memorizing them.
However, the technology still has limitations. It has only been tested on a relatively small 7B LLM. And the system has yet to learn how to dynamically decide when it is worth thinking about a passage of text - otherwise, the extra thinking steps waste too much computing power. The researchers see this as a "natural extension" and believe that even greater improvements will be possible with larger models.
Quiet-STaR points the way to more intelligent and versatile AI systems. Instead of being trained only on narrowly defined tasks, they could learn to understand the logic behind texts and conversations on their own. They could understand arguments better, formulate theories, and use language more creatively and efficiently.
Does Quiet-STaR have anything to do with OpenAI's Q*?
There are interesting parallels between the Stanford researchers' Quiet STaR method and the speculation surrounding OpenAI's mysterious Q* system, which was hailed as a major breakthrough last fall.
Both methods aim to improve the reasoning and problem-solving capabilities of AI beyond what current language models such as GPT-4 can achieve.
While Quiet-STaR teaches language models to generate and learn from possible justifications for continuing at any point in a text, Q* aims to combine language models with planning algorithms. Both are approaches to teaching AI to "reason" or "think" step by step to arrive at better solutions.
Another common theme is the importance of test-time compute: The more time the AI has to think, the better the results, both in Quiet-STaR and presumably in Q*. This is reminiscent of chess programs like AlphaZero, which increase their performance if they are allowed to compute for longer.
And of course, the name: Quiet-STaR could be abbreviated to "Q*".