Ad
Skip to content

Gemini models dominate new AI rankings for strategic board games

Google's Gemini models are outperforming the competition in board game benchmarks. Google Deepmind and Kaggle have expanded their "Game Arena" platform with two new games: Werewolf and Poker. The platform tests AI models across strategic games that measure different cognitive abilities—chess evaluates logical thinking, Werewolf tests social skills like communication and detecting deception, and Poker assesses how models handle risk and incomplete information.

These games provide objective ways to measure skills like planning and decision-making under uncertainty. Gemini 3 Pro and Gemini 3 Flash currently hold the top spots in all rankings. The Werewolf benchmark serves double duty for security research as well: it tests whether models can detect manipulation without any real-world consequences. According to Google Deepmind CEO Demis Hassabis, the AI industry needs more rigorous tests to properly evaluate the latest models.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Read on for the full picture.
Subscribe for hype-free coverage.

  • Access to all THE DECODER articles.
  • Read without distractions – no Google ads.
  • Access to comments and community discussions.
  • Weekly AI newsletter.
  • 6 times a year: “AI Radar” – deep dives on key AI topics.
  • Up to 25 % off on KI Pro online events.
  • Access to our full ten-year archive.
  • Get the latest AI news from The Decoder.
Subscribe to The Decoder