Prompt Engineering plus AlphaZero: Microsoft's XOT improves LLM's ability to generalize

Nov 12, 2023

DALL-E 3 prompted by THE DECODER

Microsoft introduces 'Everything of Thought', which can integrate external domain knowledge and produce much more reliable reasoning in language models.

Complex prompt engineering methods generally aim to make large language models more reliable in their reasoning. From simpler methods such as chain-of-thought prompting to more complex methods such as tree-of-thought prompting, they attempt to break down problems into so-called "thoughts". A thought is a simple sentence that describes a simpler sub-problem or the result of a conclusion, and an associated action, such as a simple solution to one of the sub-problems that leads to a new result.

The new "Everything of Thoughts" (XOT) method, developed by researchers at Microsoft, Georgia Institute of Technology, and East China Normal University, aims to extend the capabilities of language models with an external module inspired by AlphaZero. XOT uses reinforcement learning and Monte Carlo Tree Search (MCTS) to integrate external domain knowledge into thoughts. This should allow language models to generalize efficiently to unknown problems, the researchers said.

AlphaZero-inspired method XOT outsources the search for thought structures

Specifically, XOT uses MCTS to search for thought structures that can help solve problems. During the training phase, MCTS is used to explore possible solutions - thought structures - to a specific task, such as a puzzle. This process involves recording states, values, and visit frequencies of thought nodes in the search. The recorded data is then used to train the model through reinforcement learning to predict likely successful solution paths - eliminating the need to search the entire solution tree for each problem - and ideally, the model can then be generalized to new problems within the game.

The team then links the model to the language model to provide it with thought structures that could solve a problem posed by the language model. In a collaborative process, the language model then reviews the thoughts and thought structures and can request revisions to improve the quality of the solutions. With XOT, the language model no longer has to explore and evaluate thoughts itself. By using the external model, the demands on the language model are greatly reduced compared to other methods.

XOT brings a leap in performance in tested scenarios

The researchers tested XOT on several challenging problem-solving tasks, including the Game of 24, the 8-Puzzle, and the Pocket Cube. The results showed that XOT significantly outperformed other approaches, even solving problems where other methods failed. However, XOT did not achieve 100% reliability.

Nevertheless, the team sees the XOT framework as a promising method for integrating external domain knowledge into language model inference. It improves performance, efficiency and flexibility at the same time - a combination that cannot be achieved with other methods, they say.

It is not yet known if and when Microsoft intends to use the method for its own products. It is possible that a similar method could be used by Google Gemini: Google Deepmind CEO Demis Hassabis revealed in an interview that they would like to incorporate ideas from AlphaGo into Gemini.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Prompt Engineering plus AlphaZero: Microsoft's XOT improves LLM's ability to generalize

AlphaZero-inspired method XOT outsources the search for thought structures

XOT brings a leap in performance in tested scenarios

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.