Deepmind's "DreamerV3" collects Minecraft diamonds without human data

Deepmind's DreamerV3 is a general and scalable reinforcement learning algorithm - and can collect diamonds in Minecraft without human help.

Before AI systems from Deepmind beat Go world champions, the company began its triumphant march in reinforcement learning with Atari classics. Today, AI researchers continue to work on new reinforcement learning models that play video games. However, the focus has shifted to more complex games with open worlds and numerous challenges.

A prime example is Minecraft: the game offers sparse reward signals, requires exploration of open environments, and has long time horizons.

Many people don’t understand how challenging Minecraft is for AI agents.

Let me put it this way. AlphaGo solves a board game with only 1 task, countably many states, and full observability.

Minecraft has infinite tasks, infinite gameplay, and tons of hidden world knowledge. 🧵 pic.twitter.com/ybBkP35SZY

— Jim Fan (@DrJimFan) January 11, 2023

Researchers from Deepmind tackle this challenge with DreamerV3, which can collect diamonds in Minecraft for the first time without data from human experts or hand-crafted curricula. DreamerV3 can also be used in numerous other RL domains.

Deepmind's DreamerV3 is a general algorithm for reinforcement learning

Current algorithms can already solve many tasks in different domains - but they need to be elaborately adapted for each task. This runs against the ideal of general intelligence, which can perform entirely different tasks without modifications.

DreamerV3 differs from other RL alogrithms: It is a general and scalable algorithm with fixed hyperparameters. This reduces the amount of expertise and computational resources required to apply reinforcement learning to a problem, the researchers said.

DreamerV3 is applicable to many domains, including those with "continuous and discrete actions, visual and low-dimensional inputs, 2D and 3D worlds, different data budgets, reward frequencies, and reward scales."

For example, DreamerV3 can play 55 Atari games, manipulate objects with robotic arms in simulations, or explore and complete tasks in virtual worlds - such as Minecraft.

Recommendation

AI research

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

The algorithm relies on three neural networks: one is the World Model, which learns representations of sensor input and predicts future representations and rewards for potential actions. The other two are the Critic, which evaluates the value of each situation, and the Actor, which learns to reach situations that maximize that value.

DreamerV3 is efficient

Deepmind tested DreamerV3 in seven domains in more than 150 tasks against the best available algorithms in each, many of which are specifically designed for those challenges. It achieved a strong performance in all tests and was ahead of the previous leader in four areas - despite fixed hyperparameters. The predecessor DreamerV2 had a weaker performance, the team documents the differences to the new version in the paper.

Video: Deepmind

In Minecraft, DreamerV3 was able to mine diamonds. This is special because the algorithm has to perform numerous intermediate steps for this, such as collecting resources or making picks in a workbench.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

In fact, other AI models have managed to do this before, OpenAI's VPT was even able to create a diamond pickaxe. However, VPT required more than 70,000 hours of Minecraft gameplay videos and was trained on 720 Nvidia V100 GPUs for nine days. DreamerV3 learned to collect diamonds in 17 days on a single V100 without human data.

The algorithm also successfully scales, the team says, demonstrating better performance in various benchmarks and higher data efficiency.

Applied out of the box, DreamerV3 is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in artificial intelligence. Our general algorithm makes reinforcement learning broadly applicable and allows scaling to hard decision making problems.

From the paper.

More information is available on DreamerV3's project page. The code should also be available there shortly.

Deepmind's "DreamerV3" collects Minecraft diamonds without human data

Deepmind's DreamerV3 is a general algorithm for reinforcement learning

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

DreamerV3 is efficient

Google Deepmind CEO Demis Hassabi says world models are making progress toward AGI

Deepmind expert says trimming documents improves accuracy despite large context windows

Polite prompts can improve AI responses, says Deepmind researcher

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

Deepmind's "DreamerV3" collects Minecraft diamonds without human data

Deepmind's DreamerV3 is a general algorithm for reinforcement learning

DreamerV3 is efficient

Share

Bank details