summary Summary

Using a game manual, an AI learned an old Atari game several thousand times faster than with older methods. This approach could be useful in other areas as well.

In March 2020, DeepMind scientists unveiled Agent57, the first deep reinforcement learning (RL)-trained model to outperform humans in all 57 Atari 2600 games.

For the Atari game Skiing, which is considered particularly difficult and requires the AI agent to avoid trees on a ski slope, Agent57 needed a full 80 billion training frames - at 30 frames per second, that would take a human nearly 85 years.

AI learns to game 6,000 times faster

In a new paper, "Read and Reap the Rewards," researchers from Carnegie Mellon University, Ariel University, and Microsoft Research show how this training time can be reduced to as little as 13 million frames - or five days.

Image: Wu et al.

The Read and Reward Framework uses human-written game instructions like the game manual as a source of information for the AI agent. According to the team, the approach is promising and could significantly improve the performance of RL algorithms on Atari games.

Extracting information, making inferences

The researchers cite the length of the instructions, which are often redundant, as a challenge. In addition, they say, much of the important information in the instructions is often implicit and only makes sense if it can be related to the game. An AI agent that uses instructions must therefore be able to process and reason about the information.

The framework, therefore, consists of two main components: the QA Extraction module and the Reasoning module. The QA Extraction module extracts and groups relevant information from the instructions by asking questions and extracting answers from the text. The Reasoning module then evaluates object-agent interactions based on this information and assigns help rewards for recognized events in the game.

These help rewards are then passed to an A2C-RL (Advantage Actor Critic) agent, which was able to improve its performance in four games in the Atari environment with sparse rewards. Such games often require complex behavior until the player is rewarded - so the rewards are "sparse", and an RL agent that proceeds only by trial and error does not receive a good learning signal.

Reap the rewards outside of Atari Skiing

By using the instructions, the number of training frames required can be reduced by a factor of 1,000, the authors write. In an interview with New Scientist, first author Yue Wu even speaks of a speed-up by a factor of 6,000. Whether the manual comes from the developers themselves or from Wikipedia is irrelevant.


According to the researchers, one of the biggest challenges is object recognition in Atari games. In modern games, however, this is not a problem because they provide object ground truth, they write. In addition, recent advances in multimodal video-language models suggest that more reliable solutions will soon be available that could replace the object recognition part of the current framework. In the real world, advanced computer vision algorithms could help.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • Deepmind's Agent57 was able to master all Atari games two years ago - but it had to be trained on billions of video game frames.
  • The "Read and Reap the Rewards" framework takes a different approach: the AI agent can study the game manual and learn on average 1,000 times faster.
  • This is the first time an AI agent has learned from a manual and could lead to more efficient and powerful reinforcement learning.
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.