AI that defeated humans at Go could now help language models master mathematics

Midjourney prompted by THE DECODER

Integrating the Monte Carlo Tree Search (MCTS) algorithm into large language models could significantly enhance their ability to solve complex mathematical problems. Initial experiments show promising results.

While large language models like GPT-4 have made remarkable progress in language processing, they still struggle with tasks requiring strategic and logical thinking. Particularly in mathematics, the models tend to produce plausible-sounding but factually incorrect answers.

In a new paper, researchers from the Shanghai Artificial Intelligence Laboratory propose combining language models with the Monte Carlo Tree Search (MCTS) algorithm. MCTS is a decision-making tool used in artificial intelligence for scenarios that require strategic planning, such as games and complex problem-solving. One of the most well-known applications is AlphaGo and its successor systems like AlphaZero, which have consistently beaten humans in board games. The combination of language models and MCTS has long been considered promising and is being studied by many labs - likely including OpenAI with Q*.

The Chinese researchers combine the exploratory capabilities of MCTS with the language models' abilities for self-refinement and self-evaluation. Both are essentially prompting methods where the language model's outputs are verified within its own context window. On their own, self-refinement and self-evaluation often bring only minor improvements, but they can be better supported in combination with MCTS. The result is the MCT Self-Refine (MCTSr) algorithm, which aims to improve the performance of LLMs on complex mathematical tasks.

MCTSr combines tree search with self-evaluation and back-propagation

The MCTSr method abstracts the iterative refinement process of mathematical problem solutions into a search tree structure.

The algorithm consists of several phases: selection of the most promising node for refinement, self-refinement where the LLM generates feedback to improve the selected answer, self-evaluation to assess the refined answer through the LLM's self-reward, back-propagation to backpropagate the evaluation to the parent node and update the tree, and UCT update to select new nodes for further exploration.

MCTSr raises Llama-3-8B to GPT-4 level

The researchers tested MCTSr on various datasets, including GSM8K, MATH, and Math Olympiad problems. In all cases, the success rate increased significantly with the number of MCTSr iterations.

On the challenging AIME dataset, the solution rate rose from 2.36% in zero-shot mode to 11.79% with 8 MCTSr iterations. On the new GAIC Math Odyssey dataset, which according to the team has little overlap with the training data of the Llama-3 model used for the experiments, MCTSr achieved a solution rate of 49.36% compared to 17.22% in zero-shot mode. Overall, thanks to MCTSr, the Llama-3 model with only 8 billion parameters approached the performance of the significantly larger GPT-4.

The team now plans to test the method in other application areas.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

AI that defeated humans at Go could now help language models master mathematics

MCTSr combines tree search with self-evaluation and back-propagation

MCTSr raises Llama-3-8B to GPT-4 level

Apple's local AI agent framework paves the way for more useful Apple Intelligence

Anthropic CEO talks about being labeled a doomer and his OpenAI departure

Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects

Uber Eats uses AI to turn real meals into picture-perfect dishes

Google upgrades Gemini with Deep Think and flags early warning risks

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

Google DeepMind's Gemini wins Mathematical Olympiad gold using only natural language

AI that defeated humans at Go could now help language models master mathematics

MCTSr combines tree search with self-evaluation and back-propagation

MCTSr raises Llama-3-8B to GPT-4 level

Share

Bank details