Inception Labs introduces its Mercury Series of diffusion-based LLMs

Inception Labs has introduced Mercury, a new series of large language models that use diffusion technology rather than traditional autoregressive processing. The company reports these models can process tasks 10 times faster than current approaches, with initial releases focused on coding applications.

Unlike current large language models that generate text sequentially "from left to right" (autoregressive), Mercury's diffusion models use a "coarse-to-fine" approach. The system generates output through several refinement steps, starting from pure noise.

For the same task, Mercury Coder requires significantly fewer passes than an autoregressive model. | Video: Inception Labs

The non-sequential approach enables different handling of reasoning, response structure, and error correction. While diffusion technology is standard in image and video generation, it remains uncommon in text and audio applications.

Screenshot of a code development environment with a preview of an HTML website and a Minesweeper game board on the right. — Mercury Coder builds a Minesweeper clone in less time than it takes to solve the game. | Image: Screenshot by THE DECODER

Mercury Coder is available for testing at chat.inceptionlabs.ai. The system processes prompts while showing an interactive preview of the generated software in a sidebar.

Performance comparisons

In standard code generation tests, Mercury Coder performs similarly to autoregressive models like Gemini 2.0 Flash-Lite and GPT-4o-mini, while achieving higher speeds on standard Nvidia H100 GPUs. The system generates more than 1,000 tokens per second - previously only possible with specialized AI inference chips like those from Groq.

Scatter plot with performance comparison of different AI coding environments by output speed and memory requirements, Mercury Coder achieves top results. — The scatter diagram compares the performance of different coding AIs based on their output speed. | Bild: Inception Labs

Inception Labs is testing the technology for customer support, code generation, and business automation. Some of its customers have begun replacing autoregressive models with Mercury, and a chat model is in closed beta testing.

Former OpenAI researcher Andrej Karpathy discussed Mercury's approach on X, noting that the preference for autoregressive processing in text and audio, versus diffusion in images and videos, has been a persistent technical question and "a bit of a mystery to me and many others why, for some reason, text prefers Autoregression" over diffusion.

"If you look close enough, a lot of interesting connections emerge between the two as well," Karpathy writes, stating that Mercury may demonstrate "new, unique psychology, or new strengths and weaknesses."

Recommendation

AI research

Apple's claims about large reasoning models face fresh scrutiny from a new study

Mercury Coder is available through a Playground. Enterprise customers can request access to Mercury Coder Mini and Mercury Coder Small via API or local infrastructure deployment. Pricing information has not been released.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Inception Labs introduces its Mercury Series of diffusion-based LLMs

Performance comparisons

Apple's claims about large reasoning models face fresh scrutiny from a new study

Tencent trains AI that can explain and execute game strategies in Honor of Kings

We risk a deluge of AI-written "science" pushing corporate interests – here’s what to do about it

Study claims 78 training examples are enough to build autonomous agents

OpenAI suddenly remembers that copyright law exists after a few days of wild Sora videos

OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app

Deepmind says video models for visual tasks could become what LLMs are for text tasks

Inception Labs introduces its Mercury Series of diffusion-based LLMs

Performance comparisons

Share

Bank details