The performance of language models can be significantly improved by simply increasing the number of agents, according to a new paper.
The Tencent research team's paper, jokingly titled “More Agents Is All You Need,” examines the impact of adding more agents to a task. The title is an homage to the original Transformer paper, “Attention Is All You Need.”
The researchers introduce a “sampling-and-voting” method in which the input task is fed multiple times into a language model or cooperation framework with multiple language model agents to produce a set of results. These results are then subjected to majority voting to determine the most reliable result. This method, which does not rely on more complex methods such as chain-of-thought prompting, appears to be an effective tool that could improve existing methods, according to the results.
More agents bring Llama2-13B to the level of Llama2-70B
Their experiments with different datasets and tasks show that the performance of language models increases with the size of the ensemble, i.e. with the number of agents. The team also shows that even smaller LLMs can match or even outperform their larger counterparts simply by scaling the number of agents — without additional elaborate prompt designs or complex collaboration frameworks. For example, when applied to the GSM8K dataset, the Llama2-13B model achieved 59% accuracy, outperforming the Llama2-70B model, which achieved 54% accuracy.
However, the study also shows the limitations of this method. Performance gains initially increase as task difficulty increases, but then decrease again. This suggests that there is a complexity threshold, beyond which simply adding more agents does not lead to further improvements. Furthermore, performance increases with the prior probability of the correct answer, i.e., a model that lacks certain capabilities will not achieve them by simply scaling the agents. Under the right conditions, however, performance increases with the number of reasoning steps and, of course, with cost.
“Sampling and voting” can be combined with other methods
“More Agents” is not a silver bullet, but it is proven to help. It is also independent of existing optimization methods, such as chain-of-thought prompting, and can therefore be combined with them for further improvements.
Based on these findings, the researchers have proposed optimization strategies that can be used to make even better use of the performance of additional agents. These include stepwise sampling and voting for tasks requiring multiple reasoning steps, and a hierarchical approach for tasks with low prior probabilities, such as using different models for subtasks with different levels of difficulty.