Content
summary Summary

Researchers investigate whether large language models can effectively exhibit exploratory behavior, considered a key element for useful AI agents.

A team of researchers from Microsoft Research and Carnegie Mellon University investigated the ability of large language models to perform exploration, a key aspect of reinforcement learning and decision-making. They found that common models such as GPT-3.5, GPT-4, and Llama 2 lack robust exploration capabilities without significant external intervention.

In this work, the language models are to act as decision agents in simple Multi-Armed Bandit (MAB) environments within their attention window, i.e. in context. The main tasks of the language models in these scenarios were exploration and exploitation. Exploration here means the ability to gather information to evaluate alternatives and reduce uncertainty by making decisions that may be suboptimal in the short term but provide valuable data in the long term. Exploitation means choosing the option that seems best based on the information gathered so far in order to maximize the immediate reward. Both of these capabilities are important for the practical use of language model-based AI agents.

Specifically, the language models were examined to determine whether and how well they were able to balance these two core aspects of reinforcement learning - exploration and exploitation - in a contextualized environment that is fully described within the model prompt. The experiments included different configurations of prompts and the evaluation of the models' ability to navigate in MAB environments without additional training or intervention.

Ad
Ad

GPT-4 best with cheat sheet - new methods needed, says team

In most cases, however, the models did not show robust exploration behavior: Either they stopped permanently and never picked the best option, or they spread the choices evenly over all options without excluding the worst ones.

Only a single configuration of GPT-4 with a special prompt design showed successful exploration behavior comparable to two reference algorithms. This prompt provided the model with additional exploration cues, summarized the interaction history, and used chain-of-thought reasoning.

However, according to the team, the results indicate that language models do not have the necessary capabilities for complex decision-making without significant intervention - and are therefore not suitable for AI agents. Simpler problems, such as the multi-armed bandits tested, can be partially solved, but more sophisticated applications will likely require additional fine-tuning or specialized data sets.

The team thus provides a theoretical justification for a phenomenon that can already be observed in practice: AI agent frameworks such as AutoGPT were the focus of much attention at the beginning of the last AI wave, but such AI agents have rarely been used productively.

Companies like OpenAI have been working on better AI agents for some time now, and the implementation of reinforcement learning in the Q* project is likely to play an important role.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Microsoft Research and Carnegie Mellon University investigated whether common language models such as GPT-3.5, GPT-4, and Llama 2 are capable of effective exploratory behavior. This is important for reinforcement learning and thus for language model-based AI agents.
  • In most cases, the models did not show robust exploratory behavior. Only GPT-4, with a special prompt design that included additional exploration cues, a summary of interaction history, and chain-of-thought reasoning, showed successful exploration behavior.
  • The results suggest that language models lack the necessary capabilities for complex decision-making without significant intervention. More sophisticated applications would require additional fine-tuning or specialized data sets.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.