STEVE-1 is a chatbot that plays Minecraft

Midjourney prompted by THE DECODER

STEVE-1 is a generative AI model that can perform tasks in Minecraft using text instructions.

AI models that can respond to natural language instructions have become incredibly popular, but creating models that can follow instructions for complex sequential tasks remains a challenge. Researchers have now introduced STEVE-1, an AI assistant that can follow a wide range of short-horizon text and visual instructions in Minecraft.

STEVE-1 builds on two existing AI models - VPT, a foundation model pre-trained on 70,000 hours of Minecraft gameplay, and MineCLIP, which aligns text captions with Minecraft videos. Using an approach inspired by DALL-E 2's unCLIP method, the researchers fine-tuned VPT to follow visual goals encoded by MineCLIP, and then trained a module to translate text prompts into MineCLIP visual embeddings.

This two-step model allows STEVE-1 to follow both text and visual instructions in Minecraft with only $60 of computation and 2,000 labeled examples.

STEVE-1 outclasses previous AI agents in Minecraft

In their tests, STEVE-1 significantly outperformed previous AI agents in Minecraft when given relevant instructions, gathering far more resources and exploring farther, and can perform a variety of short-term tasks such as chopping trees, gathering resources, and exploring when prompted with text or images.

The researchers found that chaining prompts improved performance on longer-term tasks, such as crafting items or building structures, from near zero to a success rate of 50 to 70 percent. The team also shows STEVE-1 responding to human instructions in real time, demonstrating its potential as an interactive assistant.

STEVE-1 is a blueprint for "instructable agents in domains beyond Minecraft"

Although, similar to image generation, switching to a longer, more specific prompt dramatically improves STEVE-1's performance on long-horizon tasks, it is similarly unintuitive and time-consuming, and more work needs to be done, the paper states.

Because STEVE-1 works directly from raw pixel input and low-level mouse and keyboard actions, the approach could be applied more broadly to create instructable agents in domains beyond Minecraft, the team said. Future work will focus on improving STEVE-1's ability to handle longer, more complex instructions by incorporating large language models to help the agent plan and execute multistep tasks.

More information and the code is available on the STEVE-1 project page.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

STEVE-1 is a chatbot that plays Minecraft

STEVE-1 outclasses previous AI agents in Minecraft

STEVE-1 is a blueprint for "instructable agents in domains beyond Minecraft"

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

AI model simulates Counter-Strike with 10 FPS on a single RTX 3090

AI deployment: SAG-AFTRA calls for strike against "League of Legends"

Nvidia demos AI-powered mechanic in multiplayer mech game at Gamescom

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

STEVE-1 is a chatbot that plays Minecraft

STEVE-1 outclasses previous AI agents in Minecraft

STEVE-1 is a blueprint for "instructable agents in domains beyond Minecraft"

Share

Bank details