Content
summary Summary

Multi-agent AI systems can outperform solo agents, but it's often unclear whether they're truly working together or just running side by side. A new framework from Northeastern University sets out to measure real teamwork in agentic AI systems.

Ad

Developed by Christoph Riedl, the framework uses information theory to spot when groups of agents develop abilities that go beyond what each one can do alone. It gives developers a way to check if their AI teams are actually working together or just acting side by side - a crucial question for complex tasks like software development and problem-solving.

The framework breaks down cooperation into three types: agents acting identically, complementing each other, or even working at cross-purposes. The key is whether they generate information that only appears when they collaborate.

At the core of the method are Partial Information Decomposition (PID) and Time-Delayed Mutual Information (TDMI). PID splits information into redundant, unique, and synergistic parts. TDMI checks how well an agent's current state predicts the system's future. Combined, these tools let researchers measure synergy—information that only emerges when agents interact.

Ad
Ad
The three-part graphic contains a triangle diagram for classifying multi-agent coordination, a schematic representation of the group guessing game with LLM agents, and a 3D heat map of success rates depending on group size and temperature parameter.
The Information Decomposition Framework shows how multi-agent systems balance redundant, unique, and synergistic information. In the experiment, LLM agents tried to reach a target sum by guessing numbers, without communicating. Smaller groups and higher temperature settings led to better results. | Image: Christoph Riedl

Personas and strategic thinking drive real teamwork

To put the framework to the test, Riedl set up a guessing game with groups of ten AI agents. The agents couldn't communicate directly. Their task: guess numbers that add up to a hidden target, with feedback limited to "too high" or "too low."

Riedl tried three setups: a basic version with no special instructions, a version where each agent had a unique personality, and a third where agents were prompted to consider what the others might do.

Only the last setup produced real teamwork. When agents were prompted to consider each other's strategies, they took on specialized roles and divided up the problem. Their strategies started to complement each other.

For example, one agent justified its decision: "Because it's possible others might go for 4 or 5 (the absolute lower bound or just above the last 'too low'), and someone else might go for 7 or 8, I stick with the most efficient: 6." Another agent in the same group deliberately chose 8, explaining, "If anyone else in the group is feeling feisty and picks 9 or 10, my 8 will help cover the lower part safely," Riedl writes.

Riedl found that the most successful teams combined diverse, complementary strategies with a clear focus on shared goals. It’s this balance between creativity and alignment that led to the highest overall performance.

Recommendation

Team skills vary across language models

Not all language models are equally good at teamwork. GPT-4.1 agents consistently developed effective team strategies, while smaller Llama-3.1-8B models struggled—only about one in ten Llama teams solved the task. The smaller models sometimes managed to coordinate, but rarely showed real division of labor. This points to the importance of strategic thinking about teammates for strong AI collaboration.

In Riedl's research, larger models consistently outperformed smaller ones at team-based tasks, which runs counter to recent advice from Nvidia researchers who advocate for using many small models to save resources.

The study also shows the value of prompt engineering: assigning agents distinct personalities and prompting them to consider each other's actions leads to better teamwork.

With tools like OpenAI's AgentKit making multi-agent collaboration more accessible, this framework could help teams build more effective AI systems. For now, though, applying it in practice remains challenging.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researcher Christoph Riedl has created a framework using information theory to assess whether multi-agent systems genuinely cooperate as teams or simply operate in parallel.
  • The findings indicate that AI agents develop real division of labor and synergy only when they take on specific roles and strategically respond to each other—basic coordination alone does not achieve this.
  • The framework offers developers new tools to encourage and analyze teamwork in AI systems.
Sources
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.