Content
summary Summary

The ToolLLaMA language model, which specializes in API calls, can use over 16,000 APIs and achieves the performance of ChatGPT.

Chinese researchers have presented ToolLLM, a framework that brings open source models up to the quality of ChatGPT in using APIs, an aspect in which these models have lagged far behind commercial offerings.

ToolLLM is based on the LLaMA open-source model. The team trained Meta's model on a high-quality dataset called ToolBench, which was automatically generated using ChatGPT, creating the specialized ToolLLaMA. ToolBench contains instructions with corresponding API calls from 49 categories.

An example of such a request might be, "I am organizing a movie night and need some movie suggestions. Can you find me the best romantic movies from the U.S. and also a suitable venue near me?" To resolve such a request, the model must correctly call the relevant APIs, for example, a movie search API and a hotel search API.

Ad
Ad

Decision trees help with dataset creation

To build the ToolBench dataset, the team also uses a technique called Depth-First Search Decision Tree (DFSDT), which allows language models like GPT-4 to follow multiple search paths to find the best solution to an API request. According to the researchers, in experiments, DFSDT shows a clear advantage in solving difficult tasks compared to the native model or other methods such as chain-of-thought reasoning.

To further enhance ToolLLaMA's capabilities, the researchers also trained a neural API retriever that automatically recommends relevant APIs for each statement from a pool of more than 16,000 APIs.

Image: Qin, Liang et al.

Integrating the retriever with ToolLLaMA creates an automated pipeline for using complex tools without the need for manual API selection.

ToolLLaMA reaches ChatGPT quality for API calls

To evaluate the capabilities of ToolLLaMA, the team is also introducing an automated model evaluator called ToolEval. It measures two key indicators - success rate (ability to successfully complete an instruction) and win rate (comparison of solution quality with existing methods).

In the ToolEval comparison, the ToolLLaMA model achieves a success rate comparable to ChatGPT, even though it was trained with significantly fewer examples. ToolLLaMA can also successfully deal with previously unknown APIs by reading their documentation. A recently published study by Google also shows that studying such documentation can be useful.

Recommendation

More information and code is available on GitHub.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Chinese researchers present ToolLLaMA, an open-source language model capable of calling over 16,000 APIs.
  • ToolLLaMA has been trained on a high-quality dataset called ToolBench, which has also been published.
  • ToolLLaMA achieves a hit rate comparable to ChatGPT when calling APIs, and is able to handle unknown APIs through documentation.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.