Content
summary Summary

The Japanese AI startup Sakana AI has developed a new method that lets multiple large language models, such as ChatGPT and Gemini, work together on the same problem. Early tests suggest this collaborative approach outperforms individual models working alone.

Ad

The technique, called AB-MCTS (Adaptive Branching Monte Carlo Tree Search), is an algorithm that enables several AI models to tackle a problem at the same time. The models exchange and refine their suggestions, working together much like a human team.

AB-MCTS combines two different search strategies: it can either refine an existing solution (depth search) or try out entirely new approaches (breadth search). A probability model continuously decides which direction to pursue next.

In the multi-LLM version (Multi LLM AB-MCTS), the system dynamically picks which model - such as ChatGPT, Gemini, or DeepSeek - is best suited for the current task. This selection adapts on the fly depending on which model delivers the strongest results for a particular problem.

Ad
Ad
Line chart: Pass@k success rates of AB-MCTS on ARC-AGI-2 over LLM calls for various model combos; ensemble leads.
Multi-LLM combinations notably enhance AB-MCTS's Pass@k performance on ARC-AGI-2 as LLM calls increase, with o4-mini + Gemini-2.5-Pro + R1-0528 outperforming single models. | Image: Sakana AI

AB-MCTS gets results in ARC-AGI-2

During tests on the challenging ARC-AGI-2 benchmark, Multi-LLM AB-MCTS solved more problems than any single model working alone (Single-LLM AB-MCTS). In several cases, only the combination of different models led to the right answer.

There are still some limitations. When allowed unlimited guesses, the system finds a correct answer about 30 percent of the time. But in the official ARC-AGI-2 benchmark, where submissions are usually limited to one or two answers, the success rate drops significantly.

To address this, Sakana AI plans to develop new methods for automatically identifying and selecting the best suggestions. One idea is to use an additional AI model to evaluate the options. The approach could also be combined with systems where AI models discuss solutions with each other.

Sakana AI has released the algorithm as open-source software under the name TreeQuest, so other developers can apply the method to their own problems.

It's been a busy summer for the Tokyo startup. Sakana AI recently launched its self-evolving Darwin-Gödel Machine, an agent that rewrites its own Python code in rapid genetic cycles. Dozens of code variants are generated and tested on the SWE-bench and Polyglot suites, with only the top performers making the cut. After just 80 rounds, SWE-bench accuracy jumped from 20% to 50%, while Polyglot scores more than doubled to 30.7%—moving ahead of other leading open-source models.

Recommendation

In June, the company's ALE agent reached the top 21 at a live AtCoder Heuristic Contest, outperforming over 1,000 human participants. ALE uses Google's Gemini 2.5 Pro and classic optimization techniques like simulated annealing, beam search, and taboo lists, showing that LLM-based agents can handle industrial-grade optimization tasks.

These advances build on January's Transformer² study, which tackled continual learning in large language models. Taken together, Darwin-Gödel, ALE, and Transformer² outline a clear direction: evolve code, iterate solutions, and let modular, nature-inspired agents tackle problems that once needed teams of engineers.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Japanese startup Sakana AI has introduced a new method, AB-MCTS, that enables multiple large language models like ChatGPT and Gemini to collaborate on the same task, resulting in better problem-solving compared to models working independently.
  • The AB-MCTS algorithm combines two search strategies—refining existing solutions and exploring new approaches—while dynamically choosing which AI model is best suited for each step, leading to improved results on the challenging ARC-AGI-2 benchmark.
  • Although the collaborative approach boosts performance, its success rate declines when answer attempts are limited, so Sakana AI plans to develop ways to automatically select the best suggestions and has released the algorithm as open source under the name TreeQuest.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.