AI speedruns by reading the game manual

Using a game manual, an AI learned an old Atari game several thousand times faster than with older methods. This approach could be useful in other areas as well.

In March 2020, DeepMind scientists unveiled Agent57, the first deep reinforcement learning (RL)-trained model to outperform humans in all 57 Atari 2600 games.

For the Atari game Skiing, which is considered particularly difficult and requires the AI agent to avoid trees on a ski slope, Agent57 needed a full 80 billion training frames - at 30 frames per second, that would take a human nearly 85 years.

AI learns to game 6,000 times faster

In a new paper, "Read and Reap the Rewards," researchers from Carnegie Mellon University, Ariel University, and Microsoft Research show how this training time can be reduced to as little as 13 million frames - or five days.

The Read and Reward Framework uses human-written game instructions like the game manual as a source of information for the AI agent. According to the team, the approach is promising and could significantly improve the performance of RL algorithms on Atari games.

Extracting information, making inferences

The researchers cite the length of the instructions, which are often redundant, as a challenge. In addition, they say, much of the important information in the instructions is often implicit and only makes sense if it can be related to the game. An AI agent that uses instructions must therefore be able to process and reason about the information.

The framework, therefore, consists of two main components: the QA Extraction module and the Reasoning module. The QA Extraction module extracts and groups relevant information from the instructions by asking questions and extracting answers from the text. The Reasoning module then evaluates object-agent interactions based on this information and assigns help rewards for recognized events in the game.

These help rewards are then passed to an A2C-RL (Advantage Actor Critic) agent, which was able to improve its performance in four games in the Atari environment with sparse rewards. Such games often require complex behavior until the player is rewarded - so the rewards are "sparse", and an RL agent that proceeds only by trial and error does not receive a good learning signal.

Reap the rewards outside of Atari Skiing

By using the instructions, the number of training frames required can be reduced by a factor of 1,000, the authors write. In an interview with New Scientist, first author Yue Wu even speaks of a speed-up by a factor of 6,000. Whether the manual comes from the developers themselves or from Wikipedia is irrelevant.

Recommendation

AI research

New Othello experiment supports the world model hypothesis for large language models

According to the researchers, one of the biggest challenges is object recognition in Atari games. In modern games, however, this is not a problem because they provide object ground truth, they write. In addition, recent advances in multimodal video-language models suggest that more reliable solutions will soon be available that could replace the object recognition part of the current framework. In the real world, advanced computer vision algorithms could help.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI speedruns by reading the game manual

AI learns to game 6,000 times faster

Extracting information, making inferences

Reap the rewards outside of Atari Skiing

New Othello experiment supports the world model hypothesis for large language models

AI system StreamDiT generates livestream videos from text at 16 fps 512p

Researchers used 1,600 YouTube fail videos to show AI models struggle with surprises

AI coding can make developers slower even if they feel faster

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

AI speedruns by reading the game manual

AI learns to game 6,000 times faster

Extracting information, making inferences

Reap the rewards outside of Atari Skiing

Share

Bank details