The power of RTFM: large language models use online documentation to master tool usage

Midjourney prompted THE DECODER

Language models that read documentation successfully learn to use tools - and in some cases even invent new methods, according to a new research paper.

Large language models such as ChatGPT can make rudimentary use of tools or APIs. Traditionally, language models are trained with a few examples using the tools. For more complex tools, however, such demonstrations are rare or nonexistent. A team of researchers from the University of Washington, National Taiwan University, and Google has a different idea: Just read the manual - often abbreviated RTFM on the web.

Such documentation describes exactly what a tool does, such as API documentation. They are more general than a demonstration of how to use the tool for a particular task and are readily available for most software tools via README files or API references. The team, therefore, assumed that they would not only scale better but also produce better results than demonstrations because models also learn about tools in a more general and flexible way.

Training with documentation enables zero-shot tool use

The team trained several models on six different tasks using both documentation and demonstration and compared their performance. Using documentation alone, the zero-shot performance was equal to or better than models that learned only from demonstrations. Then, after scaling to a dataset of 200 tools, the first model significantly outperformed the second.

In the area of image processing, the model was able to perform complex image processing and video tracking functions without further demonstration by learning from the documentation of new, state-of-the-art image processing modules. The team highlights as particularly noteworthy that the model was able to reproduce recently released image processing techniques such as Grounded-SAM and video tracking with Track Anything, demonstrating the potential of the method for automatic knowledge discovery.

"Overall, we shed light on a new perspective of tool usage with LLMs by focusing on their internal planning and reasoning capabilities with docs, rather than explicitly guiding their behaviors with demos," the paper states.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

The power of RTFM: large language models use online documentation to master tool usage

Training with documentation enables zero-shot tool use

A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean

Researchers reveal that AI models have distinct strategic fingerprints in classic game theory tests

Sakana AI's new algorithm lets large language models work together to solve complex problems

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

The power of RTFM: large language models use online documentation to master tool usage

Training with documentation enables zero-shot tool use

A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean

Researchers reveal that AI models have distinct strategic fingerprints in classic game theory tests

Sakana AI's new algorithm lets large language models work together to solve complex problems