Content
summary Summary

Meta has released the first two models in its Llama 4 series, marking the company’s initial deployment of a multimodal architecture built from the ground up.

Ad

Both models—Llama 4 Scout and Llama 4 Maverick—use a Mixture-of-Experts (MoE) design, where only a subset of parameters is activated per input, reducing computational overhead.

According to Meta, these are the first Llama models capable of processing both text and images within a shared architecture. The company says the models were trained on a wide range of images and videos to support broad visual understanding. During pre-training, the system processed up to 48 images simultaneously. In post-training evaluations, it demonstrated strong performance with up to eight images as input.

Overview of the Llama 4 model lineup. A dedicated reasoning model is expected in future releases. | Image: Meta AI

Llama 4 Scout targets single-GPU multimodal tasks

The smaller of the two models, Llama 4 Scout, uses 17 billion active parameters from a total of 109 billion, distributed across 16 experts. It is optimized to run on a single H100 GPU and is designed for tasks such as long-form text processing, visual question answering, code analysis, and multi-image understanding.

Ad
Ad
Comparison chart: Llama 4 Scout vs. older Llama models and competitors, with a focus on image understanding benchmarks such as MMMU (69.4) and DocVQA (94.4).
Llama 4 Scout performs well on visual benchmarks, scoring 94.4 on DocVQA and 88.8 on ChartQA. It outperforms earlier models like Llama 3.3 70B and competitors such as Gemini 2.0 Flash-Lite in multimodal evaluations. | Image: Meta AI

Scout features a 10-million token context window - approximately 5 million words and longer. While impressive in scale, Meta hasn't addressed how effectively it processes complex queries beyond simple word searches. The company's use of the outdated "Needle in the Haystack" benchmark for context window testing suggests limitations, especially given the availability of more sophisticated benchmarks. All language models still show limitations across both text and image understanding.

In addition, the model was trained with a context length of only 256K tokens during both pre-training and post-training. The advertised 10-million-token window is based on length generalization rather than direct training.

Llama 4 Maverick expands scalability and performance

Llama 4 Maverick also uses 17 billion active parameters, but draws from a total of 400 billion distributed across 128 experts. Like Scout, it uses a mixture-of-experts architecture, which reduces computational overhead by activating only a subset of experts for each input. Despite these efficiency gains, the model still requires a full H100 host for deployment due to its scale. It supports context windows of up to one million tokens.

Meta reports that Llama 4 Maverick outperforms OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash across several benchmark evaluations. The model also achieves results comparable to Deepseek-V3 in reasoning and code generation tasks, despite using less than half the number of active parameters. In its experimental chat configuration, Maverick scores 1417 on the LMArena ELO ranking.

Comparison table: Llama 4 Maverick vs. Gemini Flash, DeepSeek and GPT-4o with benchmarks for image understanding, coding and cost per token.
Llama 4 Maverick performs well on multimodal tasks such as DocVQA (94.4) and ChartQA (90.0). With an assumed token cost of only $0.19-$0.49 per million, the model offers good value for money compared to the competition. The actual token costs naturally depend on the infrastructure in which Llama 4 is used. | Image: Meta AI

Both Scout and Maverick are available as open-weight models via llama.com and Hugging Face. Meta has also integrated them into products including WhatsApp, Messenger, Instagram Direct, and Meta.ai. Additional Llama 4 models are expected to be announced at LlamaCon on April 29. Registration is available here.

Recommendation

Llama 4 "Behemoth" serves as teacher model

Scout and Maverick were trained using Llama 4 Behemoth, a larger internal model with 288 billion active parameters out of a total of 2 trillion, distributed across 16 experts. Behemoth acts as a teacher model and, according to Meta, surpasses GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro in mathematical and scientific benchmarks.

However, Meta has not published a comparison with Google’s more recent Gemini 2.5 Pro, which currently leads in reasoning evaluations. Behemoth is still in training and is expected to be released at a later stage. A dedicated Llama model for reasoning has not yet been released, though Meta CEO Mark Zuckerberg said in January that one is on the way.

Comparison table: benchmark results of AI models such as Llama 4 Behemoth, Claude Sonnet, Gemini Pro and GPT-4.5 in various tests.
Behemoth’s reported benchmark scores include 95.0 on MATH-500 and 49.4 on LiveCodeBench. However, Google’s Gemini 2.5 Pro is not included in the comparison, despite outperforming in several evaluations. This omission may reflect the fact that Gemini 2.5 Pro is a reasoning-focused model, while the first two Llama 4 models are not. | Image: Meta AI

After pre-training, Meta applies multiple post-training stages to improve model performance. This includes supervised fine-tuning with curated examples, followed by online reinforcement learning, using a new asynchronous system that reportedly increases training efficiency tenfold.

Direct Preference Optimization is then used to refine output quality, with an emphasis on removing trivial examples. According to Meta, more than half of Maverick’s training set and 95 percent of Behemoth’s were filtered out to focus the models on more difficult tasks.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

EU excluded from using Llama 4 multimodal models

Meta is releasing the Llama 4 models under its standard Llama license, but with a new limitation: companies and individuals based in the EU are excluded from using the multimodal models. This restriction does not apply to end users.

Screenshot: Auszug aus der Llama-Lizenzbestimmung, die besagt, dass Entwickler in der EU Llama 4 nicht verwenden dürfen.
Meta’s updated license terms exclude the EU AI community from accessing Llama 4. | Image: Meta AI

According to Meta, the move is due to "regulatory uncertainties" surrounding the EU AI Act. The decision reflects ongoing tensions between Meta and EU policymakers, with Meta pushing either for clearer guidance or less restrictive regulations, depending on interpretation.

Developers elsewhere must display a visible "Built with Llama" label and may only use model names prefixed with "Llama." Platforms with over 700 million monthly active users must obtain special permission from Meta.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta has released two new multimodal models, Llama 4 Scout and Llama 4 Maverick, which can process both text and images within a shared architecture, marking a first for the company.
  • Scout, optimized for single-GPU tasks, features a large 10-million token context window, but Meta hasn't addressed its effectiveness in handling long-context queries beyond simple text matching.
  • Maverick outperforms competitors like OpenAI's GPT-4o and Google's Gemini 2.0 Flash in several benchmarks, despite using fewer active parameters. Companies and individuals based in the EU are excluded from using the multimodal models.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.