Google Deepmind builds a multi-agent AlphaZero that beats the old system through diversity and improves its ability to generalize.
DeepMind's AlphaZero was a turning point in AI research, achieving superhuman capabilities through self-play reinforcement learning and mastering chess at a new level. However, difficult chess puzzles still baffle even the strongest chess AI systems, suggesting room for improvement. Researchers at Google DeepMind are now proposing to combine several different AlphaZero agents into an ensemble system, called AZdb, to further improve its capabilities in chess and beyond. AZdb, combines multiple AlphaZero agents into a "league".
AlphaZero agents are inspired by human collaboration
Using "behavioral diversity" and "response diversity" techniques, AZdb's agents are trained to play chess in different ways. According to Google Deepmind, behavioral diversity maximizes the difference in average piece positions between agents, while response diversity exposes agents to games against different opponents. In practice, this also means that AZdb's agent will get to see many more different positions, expanding this range of in-distribution data, which should allow the system to better generalize to unseen positions.
As inspiration for this approach, the team cites cases in which clubs used to collaborate and play against each other via correspondence chess, such as "Kasparov versus The World," about which the famous chess player said he had "never expended as much effort into any other game in his life." Chess grandmasters also often prepare for important games with a team of strong players with different styles.
Further experiments confirm that AZdb's agents develop unique playing styles, such as preferring different openings, pawn structures, and piece survivability rates.
AZdb outperforms AlphaZero
The researchers then examined whether this diversity provides a creative advantage when attempting to solve challenging chess puzzles collected from multiple sources, including puzzles specifically designed to trick chess engines. They found that, given ample time to think, AZdb solved twice as many of these very challenging puzzles compared to individual AlphaZero. This shows that AZdb's diverse team collectively considered more possibilities, they said, with different agents specializing to excel at certain puzzle types. Chess games also showed that the agents specialized in different openings.
The researchers exploited this specialization through "sub-additive planning," in which AZdb chooses its best agent for each opening when playing against AlphaZero. This approach resulted in a 50 ELO rating increase over AlphaZero's individual performance.
Overall, the team believes that while there is still a gap between human and machine thinking, the research suggests that "incorporating human-like creativity and diversity into AZ can improve its ability to generalize."