An independent project demonstrates how Google's Gemini 2.5 Pro language model can complete the classic Game Boy game Pokémon Blue, albeit with significant technical support.
Pokémon Blue, released in 1996, is known for its complex mechanics, strategic battles, and open-world exploration—all challenges for AI systems. To succeed, an AI needs long-term planning, goal tracking, and visual navigation, skills that are central to the development of general artificial intelligence.
A developer unaffiliated with Google put Gemini 2.5 Pro Experimental to the test, guiding a character through Pokémon Blue largely autonomously. After several hundred hours, Gemini successfully completed the game. The entire playthrough is publicly available on Twitch.
How Gemini navigates Pokémon Blue
The setup combines the mGBA emulator with Gemini 2.5 Pro. The emulator feeds Gemini screenshots and game data, like the character's position, the current party of Pokémon, and the layout of the map. Gemini responds with control commands—like pressing "A," "B," or moving in a direction.
To assist Gemini's navigation, the game screen features a grid overlay. The model also receives select RAM data to enhance its environmental understanding. A text-based map tracks exploration progress, compensating for Gemini's lack of human-like spatial awareness.
Based on this information, Gemini decides on its next move or delegates complex tasks to specialized sub-agents. A "Pathfinder" agent plans routes through tricky areas, while a "Boulder Puzzle Strategist" solves specific rock-moving puzzles. Both of these agents are also instances of Gemini.
Gemini chooses when to bring in one of these agents, suggesting it can at least tell the difference between routine and more complex game situations. To manage memory, the system periodically summarizes earlier messages—about every 100 actions—to stay within token limits.
Video: via Gemini plays Pokémon
No AGI, but a well-coordinated AI model
Even with its impressive run, Gemini's performance doesn't amount to general intelligence. The developer still steps in at times, for example by limiting escape item use or correcting glitches. According to the developer, there are no direct hints or walk-throughs, except for a single case involving a known game bug.
The project relies on a stack of support tools: grid overlays, specialized agent instances, and regular memory updates. Without these, the system wouldn't work nearly as well.
It's still unclear if Gemini could pull this off without so much guidance. Still, being able to manage a complex game like Pokémon Blue under controlled conditions shows just how far large language models can go with the right setup.
Development is ongoing. The roadmap includes better memory management, integrated note-taking, a fully uninterrupted playthrough, and possible viewer interactions (but without additional help). Runs using alternative language models like Claude or o3 are also in the works.