Ad
Skip to content

New RAG system RetroLLM is more efficient and accurate than previous solutions

Image description
Midjourney prompted by THE DECODER

Key Points

  • Researchers from Renmin University of China, Tsinghua University, and Huawei Poisson Lab have developed RetroLLM, an AI system that integrates information search and text generation into a single process, offering improved efficiency compared to existing solutions.
  • RetroLLM generates clues from the given question, then employs advanced search techniques such as "Constrained Beam Search" and "Forward-Looking Constrained Decoding" to identify relevant information, which is continuously incorporated during the answer generation process.
  • In evaluations, RetroLLM demonstrated significantly better performance than existing systems, achieving 10 to 15 percent higher accuracy on question-answering tasks, with particularly strong results on more complex "multi-hop" questions that require multiple steps of reasoning.

Researchers have developed a more streamlined approach to help AI systems process information. The new system, called RetroLLM, combines two previously separate steps - searching for information and writing text - into a single process.

A team from Renmin University of China, Tsinghua University, and Huawei's Poisson Lab developed RetroLLM to make AI systems more efficient. Traditional RAG systems (retrieval-augmented generation) had to work in two separate phases: first finding relevant information, then creating text from it. RetroLLM handles both tasks simultaneously, using less computing power while delivering more accurate results.

How RetroLLM works

The system operates in three main steps. First, it creates "clues" - key words or phrases based on the original question. For example, if someone asks about the first physics Nobel Prize winner, the system identifies terms like "Nobel Prize" and "physics."

Next, RetroLLM processes information using several advanced techniques. It evaluates multiple potential text paths at once (constrained beam search), like exploring different branches of a decision tree while focusing on the most promising ones. The system can also predict which sections will be useful before fully processing them (Forward-Looking Constrained Decoding), helping it avoid time spent on irrelevant content.

Ad
DEC_D_Incontent-1

To handle large amounts of text efficiently, RetroLLM uses a sophisticated indexing system (hierarchical FM index constraints) that works like a detailed roadmap, helping it quickly locate exactly the information it needs at different levels of detail.

Technical diagram: RetroLLM framework with three main components - Clue Stage, Evidence Stage and Generation Stage for AI-powered information extraction.
The RetroLLM framework uses a three-stage process to extract information from large language models efficiently. | Bild: Li, Jin et al.

Better results, one trade-off

In testing, RetroLLM showed impressive results, achieving 10-15 percent higher accuracy than existing systems. It particularly excels at handling complex questions that require combining information from multiple sources.

The system adapts its approach based on each question. For simple queries, it might only need a few key facts. For more complex questions, it automatically searches deeper and pulls from additional sources.

While RetroLLM uses less computing power overall, researchers found one limitation: it's slightly slower than simpler systems when processing individual queries. The team believes using a combination of smaller and larger models could help solve this issue in the future.

Ad
DEC_D_Incontent-2

 

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Arxiv