A new study shows that AI language models struggle with complex tasks due to their weakest skills.
Researchers from Meta AI and the University of Illinois Urbana-Champaign discovered that large language models (LLMs) follow a "Law of the Weakest Link" when tackling complex tasks. The team created a benchmark called CrossEval to assess both individual and combined skills of LLMs.
The study evaluated seven core abilities, including English, reasoning, and programming, as well as combinations of these skills. For example, they tested programming and reasoning together, and Spanish with image recognition.
"Most notably, cross-capability performance is typically constrained by the weakest capability, following the 'Law of the Weakest Link' effect," the researchers explained. Out of 58 combinations tested, 38 scored below both individual skills, while 20 fell between the two but closer to the weaker skill.
This pattern held true across different LLMs and evaluation methods. The study also found that LLMs generally performed worse on combined skills compared to individual abilities. The researchers believe this indicates current models are heavily optimized for single skills, while skill integration has been overlooked.
Implications for AI development
The findings have important implications for future AI development. "Given that LLMs generally underperform in cross-capability tasks, identifying and enhancing these weak points should be a priority for future research and development," the study authors write.
They suggest AI developers focus on enhancing the weakest skills as this should boost overall performance on complex tasks. This approach may prove more effective than broadly improving all abilities, according to the paper.
More details and the benchmark are available on GitHub.