Content
summary Summary

The AI algorithm "Bigger, Better, Faster" masters 26 Atari games in just two hours, matching human efficiency.

Reinforcement learning is one of Google Deepmind's core research areas, and could one day solve many real-world problems with AI. One big problem, however, is that it tends to be very inefficient: RL algorithms require a lot of training data and a lot of computing power. In their latest work, Google Deepmind and researchers from Mila and Université de Montrèal show that it can be done differently.

Bigger, Better, Faster learns Atari games in two hours

The Bigger, Better, Faster (BBF) model averaged superhuman performance on Atari benchmarks. This is nothing new - other reinforcement learning agents have beaten humans in Atari games.

However, BBF learns with only 2 hours of gameplay, which is the same amount of practice time that human testers can use in the benchmark. Thus, the model-free learning algorithm achieves human learning efficiency and requires significantly less computational power than older methods. Model-free agents learn directly from the rewards and punishments they receive through their interactions with the game world, without explicitly creating a model of the game world.

Ad
Ad

The team achieved this by using a much larger network, self-monitoring training methods, and other methods to increase efficiency. For example, BBF can be trained on a single Nvidia A100 GPU, whereas other approaches require much more computing power.

Further improvements are possible, Atari still a good benchmark

The team points out that BBF is not yet superior to humans in all games in the benchmark, which omits 29 of the 55 games typically used for RL agents. However, comparing BFF to other models in all 55 games shows that the efficient algorithm is roughly on par with systems trained on 500 times more data.

The team also sees this as an indication that the Atari benchmark is still a good benchmark for RL, making the research fundable for smaller research teams.

Previous efficient RL algorithms have also shown weaknesses in scaling, whereas BFF has no limitations and continues to gain performance with more training data.

"Overall, we hope that our work inspires other researchers to continue pushing the frontier of sample efficiency in deep RL forward, to ultimately reach human-level performance across all tasks with human-level or superhuman efficiency," the team concludes.

Recommendation

More efficient RL algorithms could re-establish the method in an AI landscape currently dominated by self-supervised models.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • The "Bigger, Better, Faster" (BBF) model from Google Deepmind, Mila and the Université de Montréal achieves human-like learning efficiency in Atari games.
  • Despite requiring less computation, BBF is on par with systems trained with 500 times more data.
  • The team hopes their work will inspire other researchers to further improve sampling efficiency in Deep RL.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.