Bytedance shows off diffusion code model that's up to 5.4 times faster than previous models

Seed Diffusion Preview is Bytedance's experimental AI model for code generation, designed to generate tokens in parallel instead of one at a time. The company says it can reach speeds of 2,146 tokens per second on Nvidia H20 GPUs.

Seed Diffusion Preview uses a "discrete-state diffusion" approach. While diffusion models are usually built for continuous data like images, Bytedance has adapted the method for discrete data such as text and code.

Instead of generating each token in sequence, the model reconstructs code from a noisy, placeholder-filled state. Multiple sections of code are generated at once, thanks to a transformer architecture that enables parallel prediction, not just the standard step-by-step process.

This parallel workflow leads to much faster generation, but according to Bytedance, code quality remains high. In benchmark tests, Seed Diffusion Preview performed competitively with other models, and stood out especially for code editing tasks.

Seed Diffusion achieves 2146 tokens/s inference throughput and 55–82% benchmark results vs. Gemini Diffusion and Mercury Coder — eed Diffusion delivers the fastest inference speeds and can match or outperform autoregressive and diffusion-based competitors. | Image: Bytedance

To address problems in standard masked diffusion models, Bytedance uses a two-stage training process. The first stage relies on mask-based training, replacing parts of the code with special placeholder tokens.

But this can sometimes make the model copy unmasked tokens without really checking them. To fix this, the team added a second phase: edit-based training with insertions and deletions. This forces the model to review and correct all tokens, not just the masked ones.

The team also optimized the generation order, taking code structure and dependencies into account - for example, making sure variables are declared before they're used. They then trained the model on a large, filtered dataset of high-quality generation sequences created by the pre-trained model itself.

Self-optimizing parallel decoding

While diffusion models should, in theory, enable parallel decoding, actually achieving this is complicated. Each parallel inference step is computationally demanding, and reducing the number of steps can hurt quality.

Bytedance tackled this by training the model to optimize its own generation process using "on-policy learning." The goal is to minimize the number of steps, while a verification model checks output quality.

Recommendation

AI research

The next leap in AI depends on agents that learn by doing, not just by reading what humans wrote

For practical use, Seed Diffusion Preview processes code in parallel within blocks, but keeps a logical order between the blocks. The team also tweaked its software stack for diffusion processes, using an internal framework built for this kind of workload.

Seed Diffusion Preview is Bytedance’s answer to Google’s Gemini Diffusion, which was announced in May and also targets code generation. Bytedance says it plans to keep experimenting with scaling and adapting the approach for more complex reasoning tasks. There’s a demo available here.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Bytedance shows off diffusion code model that's up to 5.4 times faster than previous models

Self-optimizing parallel decoding

The next leap in AI depends on agents that learn by doing, not just by reading what humans wrote

ByteDance unveils faster Seedream 4.0 for image generation and editing

ByteDance's Seedance 1.0 is trading blows with Google's Veo 3

Bytedance launches Agent TARS, an open-source AI automation agent

OpenAI's new ChatGPT image model matches Google's Nano Banana Pro on complex prompts

More AI agents isn't always better, new Google and MIT study finds

GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1

Bytedance shows off diffusion code model that's up to 5.4 times faster than previous models

Self-optimizing parallel decoding

Share

Bank details