LongLLaMA pushes the limit of context length in open-source LLMs

Jul 11, 2023

Researchers have released a preview of LongLLaMA, a large language model capable of handling long contexts up to 256.000 tokens or more. Built on the open-source OpenLLaMA and fine-tuned using the Focused Transformer (FoT) method, it permits some attention layers to access a memory cache of key-value pairs to extend their context length.

According to the researchers, the model retains performance on tasks that don't require long contexts, and can be used as a drop-in replacement for shorter context LLaMA implementations. The team has released their smaller 3B variant under the Apache 2.0 license, with inference code supporting longer contexts on Hugging Face. More information and examples of LongLLaMA can be found on their GitHub repository.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

LongLLaMA pushes the limit of context length in open-source LLMs

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.