New RAG system RetroLLM is more efficient and accurate than previous solutions

Midjourney prompted by THE DECODER

Researchers have developed a more streamlined approach to help AI systems process information. The new system, called RetroLLM, combines two previously separate steps - searching for information and writing text - into a single process.

A team from Renmin University of China, Tsinghua University, and Huawei's Poisson Lab developed RetroLLM to make AI systems more efficient. Traditional RAG systems (retrieval-augmented generation) had to work in two separate phases: first finding relevant information, then creating text from it. RetroLLM handles both tasks simultaneously, using less computing power while delivering more accurate results.

How RetroLLM works

The system operates in three main steps. First, it creates "clues" - key words or phrases based on the original question. For example, if someone asks about the first physics Nobel Prize winner, the system identifies terms like "Nobel Prize" and "physics."

Next, RetroLLM processes information using several advanced techniques. It evaluates multiple potential text paths at once (constrained beam search), like exploring different branches of a decision tree while focusing on the most promising ones. The system can also predict which sections will be useful before fully processing them (Forward-Looking Constrained Decoding), helping it avoid time spent on irrelevant content.

To handle large amounts of text efficiently, RetroLLM uses a sophisticated indexing system (hierarchical FM index constraints) that works like a detailed roadmap, helping it quickly locate exactly the information it needs at different levels of detail.

Technical diagram: RetroLLM framework with three main components - Clue Stage, Evidence Stage and Generation Stage for AI-powered information extraction. — The RetroLLM framework uses a three-stage process to extract information from large language models efficiently. | Bild: Li, Jin et al.

Better results, one trade-off

In testing, RetroLLM showed impressive results, achieving 10-15 percent higher accuracy than existing systems. It particularly excels at handling complex questions that require combining information from multiple sources.

The system adapts its approach based on each question. For simple queries, it might only need a few key facts. For more complex questions, it automatically searches deeper and pulls from additional sources.

While RetroLLM uses less computing power overall, researchers found one limitation: it's slightly slower than simpler systems when processing individual queries. The team believes using a combination of smaller and larger models could help solve this issue in the future.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

New RAG system RetroLLM is more efficient and accurate than previous solutions

How RetroLLM works

Better results, one trade-off

AI agents outperform human teams in hacking competitions

Researchers combine two language models and a database for more accurate LLMs

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

New RAG system RetroLLM is more efficient and accurate than previous solutions

How RetroLLM works

Better results, one trade-off

AI agents outperform human teams in hacking competitions

Researchers combine two language models and a database for more accurate LLMs