The power of RTFM: large language models use online documentation to master tool usage

Aug 6, 2023

Midjourney prompted THE DECODER

Language models that read documentation successfully learn to use tools - and in some cases even invent new methods, according to a new research paper.

Large language models such as ChatGPT can make rudimentary use of tools or APIs. Traditionally, language models are trained with a few examples using the tools. For more complex tools, however, such demonstrations are rare or nonexistent. A team of researchers from the University of Washington, National Taiwan University, and Google has a different idea: Just read the manual - often abbreviated RTFM on the web.

Such documentation describes exactly what a tool does, such as API documentation. They are more general than a demonstration of how to use the tool for a particular task and are readily available for most software tools via README files or API references. The team, therefore, assumed that they would not only scale better but also produce better results than demonstrations because models also learn about tools in a more general and flexible way.

Training with documentation enables zero-shot tool use

The team trained several models on six different tasks using both documentation and demonstration and compared their performance. Using documentation alone, the zero-shot performance was equal to or better than models that learned only from demonstrations. Then, after scaling to a dataset of 200 tools, the first model significantly outperformed the second.

In the area of image processing, the model was able to perform complex image processing and video tracking functions without further demonstration by learning from the documentation of new, state-of-the-art image processing modules. The team highlights as particularly noteworthy that the model was able to reproduce recently released image processing techniques such as Grounded-SAM and video tracking with Track Anything, demonstrating the potential of the method for automatic knowledge discovery.

"Overall, we shed light on a new perspective of tool usage with LLMs by focusing on their internal planning and reasoning capabilities with docs, rather than explicitly guiding their behaviors with demos," the paper states.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

The power of RTFM: large language models use online documentation to master tool usage

Training with documentation enables zero-shot tool use

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.