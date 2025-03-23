AI research
Jonathan Kemper

Microsoft Research has developed a new way to feed knowledge into LLMs

Midjourney prompted by THE DECODER
Microsoft Research has developed a new way to feed knowledge into LLMs
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Profile
Content
summary Summary

Microsoft Research has developed a more efficient way to incorporate external knowledge into language models. The new system, called Knowledge Base-Augmented Language Models (KBLaM), takes a plug-and-play approach that doesn't require modifying existing models.

Ad

Unlike current approaches such as RAG or In-Context Learning, KBLaM doesn't use separate retrieval systems. Instead, it turns knowledge into vector pairs and weaves them directly into the model's architecture using what Microsoft calls "rectangular attention."

Diagram of the KBLaM architecture: tokenization of question and knowledge base, rectangular attention, language model for generating answers.
KBLaM processes knowledge directly within the model instead of using external retrieval, leading to faster and more efficient responses compared to traditional systems. | Image: Microsoft Research

Current RAG systems face a quadratic scaling problem due to their self-attention mechanism - every token must interact with every other token. When 1,000 tokens from the knowledge base are inserted into the context, the model must process one million token pairs. With 10,000 tokens, that jumps to 100 million interactions.

Line chart for performance comparison: Time to first token and memory usage for KBLaM vs. RAG with increasing number of triples in the knowledge base
Microsoft's data shows KBLaM can process 4,096 knowledge triples faster than RAG can handle just 5 triples. | Image: Microsoft Research

KBLaM sidesteps this issue: while the user's input can access all knowledge tokens, those knowledge tokens don't interact with each other or the input. This means that as the knowledge base grows, the required computational power increases only linearly. According to the researchers, a single GPU can handle more than 10,000 knowledge triples (about 200,000 tokens).

Ad
Ad

Opening up to developers

Tests show some promising results. Working with about 200 knowledge items, KBLaM is better than traditional models at avoiding hallucinations and refusing to answer questions for which it doesn't have information. It's also more transparent than in-context learning because it can link knowledge to specific tokens.

The code and datasets for KBLaM are now available on GitHub. The system works with several popular models, including Meta's Llama 3 and Microsoft's Phi-3, with plans to add support for Hugging Face Transformers. The researchers emphasize that KBLaM isn't ready for widespread use yet. While it handles straightforward question-answer scenarios well, it still needs work on more complex reasoning tasks.

LLMs struggle with an interesting paradox: their context windows keep getting bigger, letting them handle more information at once, but processing all that data reliably remains a challenge. As a result, RAG has become the go-to solution for feeding specific information into models with relative reliability, but KBLaM suggests that there may be a more efficient way forward.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Microsoft Research has developed KBLaM, a new method that directly integrates structured knowledge databases into language models without requiring separate retrieval modules or retraining the model.
  • KBLaM's computational effort grows linearly with the amount of data, in contrast to conventional methods like RAG, which scale quadratically. The system is particularly effective at avoiding hallucinations.
  • The code and data sets have been made open source and support various models such as Llama-3 and Phi-3. However, Microsoft states that further research is needed before the method can be used on a large scale.
Sources
Microsoft Research GitHub
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Profile
AI research

Microsoft's new Large Action Model can perform some tasks in Word

News, tests and reports about VR, AR and MIXED Reality.
Half-Life: Alyx turns five and is as cheap as ever on Steam The perfect XR ecosystem is accessible, social, and curated — here’s what that looks like to me Meta Quest Charts: AAA VR game aims to return to the top 10 MIXED-NEWS.com
AI research

New Microsoft tool can compress AI prompts by up to 80 percent, saving time and money

AI research

Microsoft's Orca 2 can beat LLMs 5-10 times its size thanks to a unique training method

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Microsoft Research has developed a new way to feed knowledge into LLMs

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI and society

ChatGPT's bizarre child murder claims about Arve Hjalmar Holmen leave some questions unresolved

AI and society

OpenAI and Anthropic raise alarm over China's Deepseek in warnings to US government

AI in practice

Google's new AI mode for search might turn the Web into a World Wide Wasteland

Google News