AI research

Minecraft bot Voyager programs itself using GPT-4

Maximilian Schreiner

Midjourney prompted by THE DECODER

Voyager uses GPT-4 to guide a learning Minecraft agent through the pixel world. Instead of reinforcement learning, Voyager relies on code generation.

Researchers from Nvidia, Caltech, UT Austin, Stanford, and ASU introduce Voyager, the first lifelong learning agent that plays Minecraft. Unlike other Minecraft agents that use classic reinforcement learning techniques, for example, Voyager uses GPT-4 to continuously improve itself. It does this by writing, improving, and transferring code stored in an external skill library.

This results in small programs that help navigate, open doors, mine resources, craft a pickaxe, or fight a zombie. "GPT-4 unlocks a new paradigm," says Nvidia researcher Jim Fan, who advised the project. In this paradigm, "training" is the execution of code and the "trained model" is the code base of skills that Voyager iteratively assembles.

Voyager consists of three main components:

  1. An iterative prompting mechanism that incorporates feedback from the game, execution errors, and self-checking to refine programs.
  2. A skill library with code for storing and retrieving complex behaviors.
  3. An automated curriculum to maximize exploration.

Video: Wang, Xie, Jiang, Mandlekar et al.

Voyager Minecraft agent learns in context

The Minecraft agent learns in an iterative fashion: Voyager writes a program with GPT-4 to achieve a goal and uses feedback from the game environment and possible Javascript errors to refine the program with GPT-4. In this way, Voyager gradually builds a library of skills and stores successful programs in a vector database. Complex skills are built from simpler skills.

Video: Wang, Xie, Jiang, Mandlekar et al.

To explore the diverse world of Minecraft, the team uses an automated curriculum that suggests appropriate exploration tasks based on the agent's current skills and the current state of the world. For example, the agent learns to collect sand and cactus in a desert before digging for iron.

Voyager uses information about the environment to plan new tasks with GPT-4. | Image: Wang, Xie, Jiang, Mandlekar et al.

Together, this creates an agent that is constantly learning and can perform a variety of tasks. The team runs all experiments in the MineDojo environment.

Voyager can currently only build houses with human feedback.

The team compares Voyager to other language model-based agents such as ReAct, Reflection, or Auto-GPT in Minecraft. Voyager discovered 63 different objects with 160 prompt iterations - 3.3 times more than the next best approach, the team says.

Image: Wang, Xie, Jiang, Mandlekar et al.

The automated search for previously unknown objects causes Voyager to travel extensively: Overall, the Minecraft agent travels more than twice the distance and visits more biomes. Auto-GPT and other methods, on the other hand, often get stuck in their local area.

Image: Wang, Xie, Jiang, Mandlekar et al.

The skill library built by Voyager is also compatible with Auto-GPT: The AI agent in Minecraft achieves significantly better results with it, but still lags behind Voyager.

Currently, Voyager is only text-based and cannot see what is happening in the block world. So it can't build houses. However, in an early experiment, the team used humans to give the agent visual feedback - so Voyager can learn to build houses and Nether portals, for example.

Video: Wang, Xie, Jiang, Mandlekar et al.

More information and examples are available on the Voyager project page. The code is available on GhitHub.

Sources: