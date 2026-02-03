Google's Gemini models are outperforming the competition in board game benchmarks. Google Deepmind and Kaggle have expanded their "Game Arena" platform with two new games: Werewolf and Poker. The platform tests AI models across strategic games that measure different cognitive abilities—chess evaluates logical thinking, Werewolf tests social skills like communication and detecting deception, and Poker assesses how models handle risk and incomplete information.

These games provide objective ways to measure skills like planning and decision-making under uncertainty. Gemini 3 Pro and Gemini 3 Flash currently hold the top spots in all rankings. The Werewolf benchmark serves double duty for security research as well: it tests whether models can detect manipulation without any real-world consequences. According to Google Deepmind CEO Demis Hassabis, the AI industry needs more rigorous tests to properly evaluate the latest models.

