Black Forest Labs has introduced Flux 2, a new family of image generation models that can handle high-resolution output up to four megapixels, process multiple reference images at once, and use a hybrid architecture powered by a vision language model.
The lineup includes options for a wide range of use cases, from API-only access to fully open weights. One of the main upgrades, according to the company, is its new multi-reference system.
Users can feed in up to ten reference images at the same time to keep characters, products, or visual styles consistent across generations. Flux 2 also supports creating and editing images at up to four megapixels.
The model's text rendering has also been reworked. It now aims to generate more reliable typography, infographics, and UI mockups. Black Forest Labs says prompt adherence has improved as well, especially for structured instructions and complex compositions.

Hybrid architecture with Mistral vision language model
Flux 2 combines two core components. A vision-language model, "Mistral-3 24B," interprets both text and image inputs, while a second module ("Rectified Flow Transformer") handles the logical layout and ensures that details like shapes and materials appear correctly.
Flux 2 also uses a VAE image encoder to store and restore images efficiently without losing quality. These systems work together to let the model create new content or edit existing images. A technical report is available here.

Four models for different users
The Flux 2 family includes four main versions, each tuned for different performance needs and levels of control:
- Flux 2 [pro]: The highest-quality model, intended to match leading closed-source systems. It is available through the BFL Playground, the BFL API, and launch partners.
- Flux 2 [flex]: Designed for developers who want to adjust parameters like step count or guidance scale to trade speed for quality. It is also available through the Playground and API.
- Flux 2 [dev]: A 32-billion-parameter model released with open weights. It unifies text-to-image generation and image editing in a single checkpoint. Weights are on Hugging Face, and reference code is on GitHub. An fp8-optimized build created with NVIDIA and ComfyUI runs efficiently on consumer GPUs such as the GeForce RTX. API access is available through providers including FAL, Replicate, Runware, Verda, TogetherAI, Cloudflare, and DeepInfra. Commercial use requires a license through the website.
- Flux 2 [klein]: A distilled, not-yet-released model that will be open-sourced under Apache 2.0. It aims to outperform other models of similar size. Interested users can join the beta.
Flux 2 arrives just one week after Google's Nano Banana Pro, one of the most discussed image models of recent years, making comparisons unavoidable. Even so, Flux 2 handles the following highly constrained test prompt surprisingly well:
A hyper-realistic DSLR photo. A monkey holding a pink banana is sitting on a tiger in the foreground. In the background, a HORSE is RIDING AN ASTRONAUT. The astronaut is underneath like a living "spacesuit horse saddle," and the HORSE is clearly on top, in control, as the rider. Make it 100% unambiguous: the HORSE is the rider and the ASTRONAUT is being ridden, NOT the other way around. High-resolution, sharp focus, realistic lighting.
AdAdJoin our communityJoin the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.



