Content
summary Summary

Google Deepmind builds a multi-agent AlphaZero that beats the old system through diversity and improves its ability to generalize.

DeepMind's AlphaZero was a turning point in AI research, achieving superhuman capabilities through self-play reinforcement learning and mastering chess at a new level. However, difficult chess puzzles still baffle even the strongest chess AI systems, suggesting room for improvement. Researchers at Google DeepMind are now proposing to combine several different AlphaZero agents into an ensemble system, called AZdb, to further improve its capabilities in chess and beyond. AZdb, combines multiple AlphaZero agents into a "league".

AlphaZero agents are inspired by human collaboration

Using "behavioral diversity" and "response diversity" techniques, AZdb's agents are trained to play chess in different ways. According to Google Deepmind, behavioral diversity maximizes the difference in average piece positions between agents, while response diversity exposes agents to games against different opponents. In practice, this also means that AZdb's agent will get to see many more different positions, expanding this range of in-distribution data, which should allow the system to better generalize to unseen positions.

As inspiration for this approach, the team cites cases in which clubs used to collaborate and play against each other via correspondence chess, such as "Kasparov versus The World," about which the famous chess player said he had "never expended as much effort into any other game in his life." Chess grandmasters also often prepare for important games with a team of strong players with different styles.

Ad
Ad

Further experiments confirm that AZdb's agents develop unique playing styles, such as preferring different openings, pawn structures, and piece survivability rates.

AZdb outperforms AlphaZero

The researchers then examined whether this diversity provides a creative advantage when attempting to solve challenging chess puzzles collected from multiple sources, including puzzles specifically designed to trick chess engines. They found that, given ample time to think, AZdb solved twice as many of these very challenging puzzles compared to individual AlphaZero. This shows that AZdb's diverse team collectively considered more possibilities, they said, with different agents specializing to excel at certain puzzle types. Chess games also showed that the agents specialized in different openings.

The researchers exploited this specialization through "sub-additive planning," in which AZdb chooses its best agent for each opening when playing against AlphaZero. This approach resulted in a 50 ELO rating increase over AlphaZero's individual performance.

Overall, the team believes that while there is still a gap between human and machine thinking, the research suggests that "incorporating human-like creativity and diversity into AZ can improve its ability to generalize."

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google DeepMind develops an ensemble system AZdb, combining multiple AlphaZero agents into a "league" to enhance AI chess capabilities and improve generalization.
  • AZdb's agents use "behavioral diversity" and "response diversity" to create unique playing styles and better adapt to different opponents and unseen positions.
  • Testing showed that AZdb solved twice as many challenging chess puzzles compared to individual AlphaZero and increased its ELO rating by 50 points.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.