summary Summary

PokéLLMon is a language model-based AI agent that can beat humans at Pokémon.

PokéLLMon uses large language models, wiki entries, and a form of reinforcement learning to create an AI agent that is comparable to human players.

The Georgia Institute of Technology team sees the project as a test bed for developing agents that can behave like humans in virtual worlds. According to the team, tactical combat games, especially Pokémon Battles, provide a suitable format because they offer measurable victory rates, and consistent opponents, such as AI or human players, are always available.

Pokémon Battles are strategically challenging, requiring players to consider a wide range of factors, from the characteristics of the Pokémon to the environmental conditions of the game.


PokéLLMon reads Pokédex and learns in battle

Without assistance, even the best language models, such as GPT-4, fall far short of the human level. So the team developed a method based on three key elements:

In-Context Reinforcement Learning (ICRL)

In ICRL, PokéLLMon iteratively improves its strategy based on text-based feedback from previous battles. This feedback serves as a kind of "reward" and includes information about the evolution of a Pokémon's HP, the effectiveness of attacks, and the priority of move execution. According to the team, this allows the agent to continually refine its strategies and correct mistakes.

Knowledge Augmented Generation (KAG)

KAG allows PokéLLMon to incorporate external knowledge, such as type advantages and effects of moves or abilities, into its decision-making. This knowledge comes from the Pokédex, an encyclopedia of Pokémon. The team believes that the KAG reduces the problem of hallucinations.

Image: Hu et al.

Consistent Action Generation (CAG)

CAG is used to mitigate the phenomenon of "panic switching", where the agent tends to generate inconsistent actions when facing a strong opponent because it wants to avoid fighting. Selecting the most coherent actions as the result ensures that the agent does not act rashly in a state of "panic."

PokéLLMon beats humans, but is inferior to good players

In online battles against human players, PokéLLMon has a 49% win rate in ladder battles and a 56% win rate in one-on-one battles. This puts the Pokémon agent on par with human players on average.

Although PokéLLMon is on par with human players in many areas, it still has weaknesses. According to the researchers, it tends to favour actions that offer short-term advantages and is susceptible to the long-term strategies of human players. It can also be tricked into unfavourable actions by the deceptive manoeuvres of experienced players. The team is now working to address these weaknesses.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • PokéLLMon is an AI agent that relies on large language models, wiki entries, and reinforcement learning to compete against human players in Pokémon battles.
  • In online battles against human players, PokéLLMon achieves a win rate of 49% in ladder battles and 56% in one-on-one battles, which is about human level, although it still has weaknesses in long-term strategies and deceptive manoeuvres.
  • The project serves as a test bed for the development of AI agents that behave similarly to humans in virtual worlds.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.