Open-source ToolLLaMA can call over 16,000 APIs

Midjourney prompted by THE DECODER

The ToolLLaMA language model, which specializes in API calls, can use over 16,000 APIs and achieves the performance of ChatGPT.

Chinese researchers have presented ToolLLM, a framework that brings open source models up to the quality of ChatGPT in using APIs, an aspect in which these models have lagged far behind commercial offerings.

ToolLLM is based on the LLaMA open-source model. The team trained Meta's model on a high-quality dataset called ToolBench, which was automatically generated using ChatGPT, creating the specialized ToolLLaMA. ToolBench contains instructions with corresponding API calls from 49 categories.

An example of such a request might be, "I am organizing a movie night and need some movie suggestions. Can you find me the best romantic movies from the U.S. and also a suitable venue near me?" To resolve such a request, the model must correctly call the relevant APIs, for example, a movie search API and a hotel search API.

Decision trees help with dataset creation

To build the ToolBench dataset, the team also uses a technique called Depth-First Search Decision Tree (DFSDT), which allows language models like GPT-4 to follow multiple search paths to find the best solution to an API request. According to the researchers, in experiments, DFSDT shows a clear advantage in solving difficult tasks compared to the native model or other methods such as chain-of-thought reasoning.

To further enhance ToolLLaMA's capabilities, the researchers also trained a neural API retriever that automatically recommends relevant APIs for each statement from a pool of more than 16,000 APIs.

Integrating the retriever with ToolLLaMA creates an automated pipeline for using complex tools without the need for manual API selection.

ToolLLaMA reaches ChatGPT quality for API calls

To evaluate the capabilities of ToolLLaMA, the team is also introducing an automated model evaluator called ToolEval. It measures two key indicators - success rate (ability to successfully complete an instruction) and win rate (comparison of solution quality with existing methods).

In the ToolEval comparison, the ToolLLaMA model achieves a success rate comparable to ChatGPT, even though it was trained with significantly fewer examples. ToolLLaMA can also successfully deal with previously unknown APIs by reading their documentation. A recently published study by Google also shows that studying such documentation can be useful.

Recommendation

AI in practice

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

More information and code is available on GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Open-source ToolLLaMA can call over 16,000 APIs

Decision trees help with dataset creation

ToolLLaMA reaches ChatGPT quality for API calls

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Multimodal AI model Aria is open source and beats many competitors

Researchers collect 950,000 hours of open source speech data for EU languages

Giraffe extends Meta's LLaMA context window to dozens of pages

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Open-source ToolLLaMA can call over 16,000 APIs

Decision trees help with dataset creation

ToolLLaMA reaches ChatGPT quality for API calls

Share

Bank details