A new information-theory framework reveals when multi-agent AI systems truly work as a team

Multi-agent AI systems can outperform solo agents, but it's often unclear whether they're truly working together or just running side by side. A new framework from Northeastern University sets out to measure real teamwork in agentic AI systems.

Developed by Christoph Riedl, the framework uses information theory to spot when groups of agents develop abilities that go beyond what each one can do alone. It gives developers a way to check if their AI teams are actually working together or just acting side by side - a crucial question for complex tasks like software development and problem-solving.

The framework breaks down cooperation into three types: agents acting identically, complementing each other, or even working at cross-purposes. The key is whether they generate information that only appears when they collaborate.

At the core of the method are Partial Information Decomposition (PID) and Time-Delayed Mutual Information (TDMI). PID splits information into redundant, unique, and synergistic parts. TDMI checks how well an agent's current state predicts the system's future. Combined, these tools let researchers measure synergy—information that only emerges when agents interact.

The three-part graphic contains a triangle diagram for classifying multi-agent coordination, a schematic representation of the group guessing game with LLM agents, and a 3D heat map of success rates depending on group size and temperature parameter. — The Information Decomposition Framework shows how multi-agent systems balance redundant, unique, and synergistic information. In the experiment, LLM agents tried to reach a target sum by guessing numbers, without communicating. Smaller groups and higher temperature settings led to better results. | Image: Christoph Riedl

Personas and strategic thinking drive real teamwork

To put the framework to the test, Riedl set up a guessing game with groups of ten AI agents. The agents couldn't communicate directly. Their task: guess numbers that add up to a hidden target, with feedback limited to "too high" or "too low."

Riedl tried three setups: a basic version with no special instructions, a version where each agent had a unique personality, and a third where agents were prompted to consider what the others might do.

Only the last setup produced real teamwork. When agents were prompted to consider each other's strategies, they took on specialized roles and divided up the problem. Their strategies started to complement each other.

For example, one agent justified its decision: "Because it's possible others might go for 4 or 5 (the absolute lower bound or just above the last 'too low'), and someone else might go for 7 or 8, I stick with the most efficient: 6." Another agent in the same group deliberately chose 8, explaining, "If anyone else in the group is feeling feisty and picks 9 or 10, my 8 will help cover the lower part safely," Riedl writes.

Riedl found that the most successful teams combined diverse, complementary strategies with a clear focus on shared goals. It’s this balance between creativity and alignment that led to the highest overall performance.

Recommendation

AI research

OpenAI says GPT-5 shows 30 percent less political bias than previous models

Team skills vary across language models

Not all language models are equally good at teamwork. GPT-4.1 agents consistently developed effective team strategies, while smaller Llama-3.1-8B models struggled—only about one in ten Llama teams solved the task. The smaller models sometimes managed to coordinate, but rarely showed real division of labor. This points to the importance of strategic thinking about teammates for strong AI collaboration.

In Riedl's research, larger models consistently outperformed smaller ones at team-based tasks, which runs counter to recent advice from Nvidia researchers who advocate for using many small models to save resources.

The study also shows the value of prompt engineering: assigning agents distinct personalities and prompting them to consider each other's actions leads to better teamwork.

With tools like OpenAI's AgentKit making multi-agent collaboration more accessible, this framework could help teams build more effective AI systems. For now, though, applying it in practice remains challenging.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

A new information-theory framework reveals when multi-agent AI systems truly work as a team

Personas and strategic thinking drive real teamwork

OpenAI says GPT-5 shows 30 percent less political bias than previous models

Team skills vary across language models

The future of AI browsing may depend on developers rethinking how they build websites

A self-rewriting AI from KAUST revives Jürgen Schmidhuber’s vision of a Gödel Machine

Researchers discover three factors that make AI agents significantly smarter

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

German court deepens the split on AI and copyright with its latest ruling

A new information-theory framework reveals when multi-agent AI systems truly work as a team

Personas and strategic thinking drive real teamwork

Team skills vary across language models

Share

Bank details