The AI algorithm "Bigger, Better, Faster" masters 26 Atari games in just two hours, matching human efficiency.
Reinforcement learning is one of Google Deepmind's core research areas, and could one day solve many real-world problems with AI. One big problem, however, is that it tends to be very inefficient: RL algorithms require a lot of training data and a lot of computing power. In their latest work, Google Deepmind and researchers from Mila and Université de Montrèal show that it can be done differently.
Bigger, Better, Faster learns Atari games in two hours
The Bigger, Better, Faster (BBF) model averaged superhuman performance on Atari benchmarks. This is nothing new - other reinforcement learning agents have beaten humans in Atari games.
However, BBF learns with only 2 hours of gameplay, which is the same amount of practice time that human testers can use in the benchmark. Thus, the model-free learning algorithm achieves human learning efficiency and requires significantly less computational power than older methods. Model-free agents learn directly from the rewards and punishments they receive through their interactions with the game world, without explicitly creating a model of the game world.
The team achieved this by using a much larger network, self-monitoring training methods, and other methods to increase efficiency. For example, BBF can be trained on a single Nvidia A100 GPU, whereas other approaches require much more computing power.
Further improvements are possible, Atari still a good benchmark
The team points out that BBF is not yet superior to humans in all games in the benchmark, which omits 29 of the 55 games typically used for RL agents. However, comparing BFF to other models in all 55 games shows that the efficient algorithm is roughly on par with systems trained on 500 times more data.
The team also sees this as an indication that the Atari benchmark is still a good benchmark for RL, making the research fundable for smaller research teams.
Previous efficient RL algorithms have also shown weaknesses in scaling, whereas BFF has no limitations and continues to gain performance with more training data.
"Overall, we hope that our work inspires other researchers to continue pushing the frontier of sample efficiency in deep RL forward, to ultimately reach human-level performance across all tasks with human-level or superhuman efficiency," the team concludes.
More efficient RL algorithms could re-establish the method in an AI landscape currently dominated by self-supervised models.