Hundreds of examples in prompts can significantly boost LLM performance, study finds

A study shows that when large language models (LLMs) see hundreds or thousands of examples right in the prompt, their performance on a variety of tasks improves significantly, according to researchers from Google, DeepMind, and other institutions.

The researchers studied how LLMs' performance improves when they are given many examples to learn from directly in the prompt, rather than just a few. This approach is called Many-Shot In-Context Learning (ICL).

In-context learning (ICL) means that the examples are given directly in the context (prompt), without adjusting the model parameters as in fine-tuning. The latter is much more time consuming and expensive.

Previously, models were usually given only a few examples (one shot, few shot) because they could not process and generate a lot of text at once. With larger "context windows", a kind of short-term memory, it is now possible to provide the model with hundreds or thousands of examples directly in the prompt (many shot).

The researchers tested Many-Shot ICL with Google's Gemini 1.5 Pro language model, which can process up to one million tokens (about 700,000 words) in context. The result: Many-shot prompts significantly outperformed few-shot prompts on tasks such as translating, summarizing, planning, and answering questions.

With about 1000 translation examples, Gemini 1.5 even outperformed Google Translate in Kurdish and Tamil, the target languages with the largest reported gap between LLMs and Google Translate to date.

When summarizing news, it could almost keep up with specialized programs, but occasionally had hallucinations such as incorrect data and times that did not appear in the learning examples. In addition, performance dropped off after more than 50 examples, a phenomenon the researchers can't yet explain.

In tests with the XSUM dataset for news summaries, performance improved up to 50 examples in the prompt. After that, performance declined, without the researchers having an explanation.| Image: Agarwal, Singh et al.

LLMs can generate their own learning examples

For tricky logical tasks, such as mathematical or scientific problems, the researchers had the model create its own solutions and used them as additional learning examples. This approach ("Reinforced ICL") worked more reliably than the human-created solutions.

In one experiment, the model was only given the problems without solutions ("Unsupervised ICL"). For some logical tasks, this still worked better than a few complete examples. However, it usually did not quite match the self-generated solutions with "Reinforced ICL".

Recommendation

AI in practice

OpenAI's new "Orion" model reportedly shows small gains over GPT-4

The researchers also found that the model "unlearned" errors from pre-training through examples and could even recognize abstract mathematical patterns when shown enough examples.

However, the order in which the examples were given to the model made a difference, making the prompt more complex. It's also an open question why performance sometimes declines with even more examples. Future research is needed to clarify this.

In any case, the results show that language models can learn reliably from many examples in the prompt. This could make time-consuming training for specific tasks unnecessary in the future.

It also results in an additional task for prompt writers: They need to find or generate high-quality examples that fit the task.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Hundreds of examples in prompts can significantly boost LLM performance, study finds

LLMs can generate their own learning examples

OpenAI's new "Orion" model reportedly shows small gains over GPT-4

Deepmind CEO Demis Hassabis says AI agents for complex tasks coming in 1-2 years

AI fact-checking is more accurate and 20 times cheaper than human effort, study finds

DeepMind has found a simple way to make language models reason better

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Hundreds of examples in prompts can significantly boost LLM performance, study finds

LLMs can generate their own learning examples

Share

Bank details