Deepmind's DreamerV3 is a general and scalable reinforcement learning algorithm - and can collect diamonds in Minecraft without human help.
Before AI systems from Deepmind beat Go world champions, the company began its triumphant march in reinforcement learning with Atari classics. Today, AI researchers continue to work on new reinforcement learning models that play video games. However, the focus has shifted to more complex games with open worlds and numerous challenges.
A prime example is Minecraft: the game offers sparse reward signals, requires exploration of open environments, and has long time horizons.
Many people don’t understand how challenging Minecraft is for AI agents.
Let me put it this way. AlphaGo solves a board game with only 1 task, countably many states, and full observability.
Minecraft has infinite tasks, infinite gameplay, and tons of hidden world knowledge. 🧵 pic.twitter.com/ybBkP35SZY
— Jim Fan (@DrJimFan) January 11, 2023
Researchers from Deepmind tackle this challenge with DreamerV3, which can collect diamonds in Minecraft for the first time without data from human experts or hand-crafted curricula. DreamerV3 can also be used in numerous other RL domains.
Deepmind's DreamerV3 is a general algorithm for reinforcement learning
Current algorithms can already solve many tasks in different domains - but they need to be elaborately adapted for each task. This runs against the ideal of general intelligence, which can perform entirely different tasks without modifications.
DreamerV3 differs from other RL alogrithms: It is a general and scalable algorithm with fixed hyperparameters. This reduces the amount of expertise and computational resources required to apply reinforcement learning to a problem, the researchers said.
DreamerV3 is applicable to many domains, including those with "continuous and discrete actions, visual and low-dimensional inputs, 2D and 3D worlds, different data budgets, reward frequencies, and reward scales."
For example, DreamerV3 can play 55 Atari games, manipulate objects with robotic arms in simulations, or explore and complete tasks in virtual worlds - such as Minecraft.
The algorithm relies on three neural networks: one is the World Model, which learns representations of sensor input and predicts future representations and rewards for potential actions. The other two are the Critic, which evaluates the value of each situation, and the Actor, which learns to reach situations that maximize that value.
DreamerV3 is efficient
Deepmind tested DreamerV3 in seven domains in more than 150 tasks against the best available algorithms in each, many of which are specifically designed for those challenges. It achieved a strong performance in all tests and was ahead of the previous leader in four areas - despite fixed hyperparameters. The predecessor DreamerV2 had a weaker performance, the team documents the differences to the new version in the paper.
In Minecraft, DreamerV3 was able to mine diamonds. This is special because the algorithm has to perform numerous intermediate steps for this, such as collecting resources or making picks in a workbench.
In fact, other AI models have managed to do this before, OpenAI's VPT was even able to create a diamond pickaxe. However, VPT required more than 70,000 hours of Minecraft gameplay videos and was trained on 720 Nvidia V100 GPUs for nine days. DreamerV3 learned to collect diamonds in 17 days on a single V100 without human data.
The algorithm also successfully scales, the team says, demonstrating better performance in various benchmarks and higher data efficiency.
Applied out of the box, DreamerV3 is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in artificial intelligence. Our general algorithm makes reinforcement learning broadly applicable and allows scaling to hard decision making problems.
From the paper.
More information is available on DreamerV3's project page. The code should also be available there shortly.