Content
summary Summary

Researchers are giving AI agents access to a "world knowledge model". This should enable them to solve tasks more effectively, without going nowhere or generating nonsense.

Ad

Agents based on large language models such as GPT-4 repeatedly show potential in planning and solving complex tasks. However, they often operate on a trial-and-error basis and hallucinate unrealistic actions. In a new study, researchers from Zhejiang University and Alibaba are investigating whether an external, learned "World Knowledge Model" (WKM), which provides AI agents with additional knowledge, can improve the performance of the models.

Similar to how humans build mental models of the world, AI agents should be able to use such a model.

Image: Qiao, Fang et al.

The researchers distinguish between global "task knowledge" and local "state knowledge". Task knowledge is intended to give the agent an overview of the steps required to solve a task in advance and to avoid taking the wrong path. State knowledge records what the agent knows about the current state of the world at each step. This is to avoid contradictions.

Ad
Ad

To train the external model, the researchers allow the agent to extract knowledge from successful and unsuccessful problem-solving attempts by humans and from its own attempts. The agent then uses this to generate relevant task and state knowledge.

External model improves performance in new tasks

In the planning phase, the WKM first provides the task knowledge as a guideline. It then generates a description of the current state for each step. It searches a knowledge base for similar states and their subsequent actions. The next action is then selected from their probabilities and those of the agent model.

Experiments show that AI agents with WKM perform significantly better than those without. Knowledge of the world pays off, especially for new, unknown tasks.

Specifically, the team tested the agents on three complex, realistically simulated datasets: ALFWorld, WebShop and ScienceWorld.

In ALFWorld, the agents have to perform tasks in a virtual household situation, such as picking up objects and interacting with household appliances. WebShop simulates a shopping experience where the agents have to find and buy certain items in a virtual store. ScienceWorld requires agents to perform scientific experiments in a virtual laboratory environment.

Recommendation

The team tested open source LLMs (Mistral-7B, Gemma-7B and Llama-3-8B) and WKMs and compared their performance with that of GPT-4. The experiments showed that the developed WKMs were able to significantly improve the performance of the agents and in some tasks outperformed GPT-4 in a duel. In a separate experiment, the team also showed that the smaller models plus WKM can be used to train GPT-4 and significantly improve its performance.

Next, the researchers want to train a unified world knowledge model that can support different agents in different tasks.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Zhejiang University and Alibaba are equipping AI agents with an external "World Knowledge Model" (WKM) that provides them with additional knowledge to solve complex tasks more effectively.
  • The WKM distinguishes between global "task knowledge", which gives the agent an overview of the steps required to solve a task in advance, and local "state knowledge", which records what the agent knows about the current state of the world at each step.
  • In experiments with three datasets (ALFWorld, WebShop and ScienceWorld), AI agents with WKM performed significantly better than those without, especially on new, unfamiliar tasks, and in some cases even outperformed GPT-4.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.