Content
summary Summary

Voyager uses GPT-4 to guide a learning Minecraft agent through the pixel world. Instead of reinforcement learning, Voyager relies on code generation.

Ad

Researchers from Nvidia, Caltech, UT Austin, Stanford, and ASU introduce Voyager, the first lifelong learning agent that plays Minecraft. Unlike other Minecraft agents that use classic reinforcement learning techniques, for example, Voyager uses GPT-4 to continuously improve itself. It does this by writing, improving, and transferring code stored in an external skill library.

This results in small programs that help navigate, open doors, mine resources, craft a pickaxe, or fight a zombie. "GPT-4 unlocks a new paradigm," says Nvidia researcher Jim Fan, who advised the project. In this paradigm, "training" is the execution of code and the "trained model" is the code base of skills that Voyager iteratively assembles.

Voyager consists of three main components:

Ad
Ad
  1. An iterative prompting mechanism that incorporates feedback from the game, execution errors, and self-checking to refine programs.
  2. A skill library with code for storing and retrieving complex behaviors.
  3. An automated curriculum to maximize exploration.

Video: Wang, Xie, Jiang, Mandlekar et al.

Voyager Minecraft agent learns in context

The Minecraft agent learns in an iterative fashion: Voyager writes a program with GPT-4 to achieve a goal and uses feedback from the game environment and possible Javascript errors to refine the program with GPT-4. In this way, Voyager gradually builds a library of skills and stores successful programs in a vector database. Complex skills are built from simpler skills.

Video: Wang, Xie, Jiang, Mandlekar et al.

To explore the diverse world of Minecraft, the team uses an automated curriculum that suggests appropriate exploration tasks based on the agent's current skills and the current state of the world. For example, the agent learns to collect sand and cactus in a desert before digging for iron.

Voyager uses information about the environment to plan new tasks with GPT-4. | Image: Wang, Xie, Jiang, Mandlekar et al.

Together, this creates an agent that is constantly learning and can perform a variety of tasks. The team runs all experiments in the MineDojo environment.

Recommendation

Voyager can currently only build houses with human feedback.

The team compares Voyager to other language model-based agents such as ReAct, Reflection, or Auto-GPT in Minecraft. Voyager discovered 63 different objects with 160 prompt iterations - 3.3 times more than the next best approach, the team says.

Image: Wang, Xie, Jiang, Mandlekar et al.

The automated search for previously unknown objects causes Voyager to travel extensively: Overall, the Minecraft agent travels more than twice the distance and visits more biomes. Auto-GPT and other methods, on the other hand, often get stuck in their local area.

Image: Wang, Xie, Jiang, Mandlekar et al.

The skill library built by Voyager is also compatible with Auto-GPT: The AI agent in Minecraft achieves significantly better results with it, but still lags behind Voyager.

Currently, Voyager is only text-based and cannot see what is happening in the block world. So it can't build houses. However, in an early experiment, the team used humans to give the agent visual feedback - so Voyager can learn to build houses and Nether portals, for example.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Video: Wang, Xie, Jiang, Mandlekar et al.

More information and examples are available on the Voyager project page. The code is available on GhitHub.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • The Voyager AI agent uses GPT-4 for "lifelong learning" in Minecraft. One of the researchers involved calls it a "new paradigm".
  • The agent improves itself by writing and rewriting code and storing successful behaviors in an external library.
  • Voyager outperforms other language-model-based approaches, but is still purely text-based and thus currently fails at visual tasks such as building houses without human assistance.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.