Do large language models really need large context windows?

AI companies like Google, OpenAI, and Anthropic are touting extra-large context windows for their models, allowing them to process a lot of data at once. But are they really the best way forward?

The main development in large language models recently has been massive context windows. Companies say they can be used to process giant documents, like entire books or even series of books, all at once.

While this is true, they don't mention an important detail: the processing isn't reliable. The more information you put into the AI model, the more likely it is to miss essential details, for example in a summary.

This doesn't make large context windows useless, but it does make them less useful for many tasks. Also, large context windows mean that the models cost more to run and consume more power.

Making better use of small context windows

Researchers from Renmin University in China and the Beijing Academy of Artificial Intelligence now say in a paper that most long-text tasks can be done with smaller context windows. This is because often only parts of the long text matter for the task.

They developed a method based on GPT-3.5 called LC-Boost. LC-Boost breaks up long texts into shorter parts and lets the language model with a smaller context window choose which parts are needed for the task and how best to use them. This allows the model to process only the relevant parts and filter out unimportant information.

Comparison of different large context processing methods from the standard context window (far left) to LC-Boost (far right). | Image: Qian et al.

In tests on twelve datasets of question-answering, summarization, and code tasks, LC-Boost performed as well or better with a context window of 4,000 tokens than models with up to 200,000 context tokens. In particular, LC-Boost performed better on question-answering tasks because it was more accurate in finding the exact information needed for an answer.

In benchmarks, the LC-Boost version based on GPT-3.5 performed better in almost all tasks than models with longer context windows. | Image: Qian et al.

To demonstrate how well LC-Boost works, the researchers used the 122,000-word novel "Harry Potter and the Chamber of Secrets" as an example.

When asked "List all the characters in the book who were petrified," the LC-Boost system found three of the five characters in the story who were petrified, searching the text step by step and summarizing the results at the end. It's not perfect, but it's better than, say, Claude 3 Haiku, which only finds one character.

Recommendation

AI research

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

The researchers' energy consumption analysis also shows that LC-Boost, with its short context window, consumes much less energy than models that process the entire text at once. With the latter, energy consumption explodes as the context lengthens.

The authors see their approach as an important step toward limiting the huge resource consumption of large language models. They expect AI systems to be ubiquitous in the future, which means that their energy requirements could become a major environmental problem. More efficient methods like LC-Boost may be in demand.

LC-Boost with GPT-4 outperformed standard GPT-4 on most long context text tasks.| Image: Qian et al.

The study shows that there may be alternatives to large context windows that can achieve at least equivalent results with smarter methods using smaller windows - and at significantly lower energy consumption. However, there may be more complex scenarios that require an understanding of the entire context. According to the authors, LC-Boost may be less suitable for such tasks.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Do large language models really need large context windows?

Making better use of small context windows

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

Apple's "Illusion of Thinking" paper shows experts deeply divided on AI reasoning

AI agents can be easily tricked into doing stupid things, study says

Want to understand ChatGPT? Watch Andrej Karpathy's explanation of how LLMs work

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Do large language models really need large context windows?

Making better use of small context windows

Share

Bank details