AI research
Maximilian Schreiner

New RAG system RetroLLM is more efficient and accurate than previous solutions

Midjourney prompted by THE DECODER
New RAG system RetroLLM is more efficient and accurate than previous solutions
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
Content
summary Summary

Researchers have developed a more streamlined approach to help AI systems process information. The new system, called RetroLLM, combines two previously separate steps - searching for information and writing text - into a single process.

Ad

A team from Renmin University of China, Tsinghua University, and Huawei's Poisson Lab developed RetroLLM to make AI systems more efficient. Traditional RAG systems (retrieval-augmented generation) had to work in two separate phases: first finding relevant information, then creating text from it. RetroLLM handles both tasks simultaneously, using less computing power while delivering more accurate results.

How RetroLLM works

The system operates in three main steps. First, it creates "clues" - key words or phrases based on the original question. For example, if someone asks about the first physics Nobel Prize winner, the system identifies terms like "Nobel Prize" and "physics."

Next, RetroLLM processes information using several advanced techniques. It evaluates multiple potential text paths at once (constrained beam search), like exploring different branches of a decision tree while focusing on the most promising ones. The system can also predict which sections will be useful before fully processing them (Forward-Looking Constrained Decoding), helping it avoid time spent on irrelevant content.

Ad
Ad

To handle large amounts of text efficiently, RetroLLM uses a sophisticated indexing system (hierarchical FM index constraints) that works like a detailed roadmap, helping it quickly locate exactly the information it needs at different levels of detail.

Technical diagram: RetroLLM framework with three main components - Clue Stage, Evidence Stage and Generation Stage for AI-powered information extraction.
The RetroLLM framework uses a three-stage process to extract information from large language models efficiently. | Bild: Li, Jin et al.

Better results, one trade-off

In testing, RetroLLM showed impressive results, achieving 10-15 percent higher accuracy than existing systems. It particularly excels at handling complex questions that require combining information from multiple sources.

The system adapts its approach based on each question. For simple queries, it might only need a few key facts. For more complex questions, it automatically searches deeper and pulls from additional sources.

While RetroLLM uses less computing power overall, researchers found one limitation: it's slightly slower than simpler systems when processing individual queries. The team believes using a combination of smaller and larger models could help solve this issue in the future.

 

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
AI research

Google Deepmind's new AI agent plays games using only natural language

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers from Renmin University of China, Tsinghua University, and Huawei Poisson Lab have developed RetroLLM, an AI system that integrates information search and text generation into a single process, offering improved efficiency compared to existing solutions.
  • RetroLLM generates clues from the given question, then employs advanced search techniques such as "Constrained Beam Search" and "Forward-Looking Constrained Decoding" to identify relevant information, which is continuously incorporated during the answer generation process.
  • In evaluations, RetroLLM demonstrated significantly better performance than existing systems, achieving 10 to 15 percent higher accuracy on question-answering tasks, with particularly strong results on more complex "multi-hop" questions that require multiple steps of reasoning.
Sources
Arxiv
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
AI research

Researchers combine two language models and a database for more accurate LLMs

News, tests and reports about VR, AR and MIXED Reality.
Meta Quest: Wooorld now offers an immersive Google Earth VR experience Hands-On: Walk the Plank is Richie's Plank Experience with more realistic graphics 2025 could be VR gaming’s biggest year yet, if industry leaders let it MIXED-NEWS.com
Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

New RAG system RetroLLM is more efficient and accurate than previous solutions

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI in practice

The great AI scaling debate continues into 2025

AI research

Deepseek's $5.6M Chinese LLM wonder shakes up the AI elite

AI in practice

OpenAI unveils o3, its most advanced reasoning model yet

Google News