French AI company Mistral releases its first multimodal model Pixtral-12B

French AI company Mistral introduced Pixtral-12B, its first multimodal model capable of processing both images and text, at a conference in San Francisco. This open-source model builds on Mistral's text model NeMo-12B, launched earlier this year.

With 12 billion parameters and a size of about 24 GB, Pixtral-12B can answer questions about multiple images provided as URLs or Base64-encoded. Its vision encoder supports image resolutions up to 1,024 x 1,024 pixels.

Benchmark performance

Mistral claims Pixtral-12B outperforms other open-source vision models like Phi 3, Qwen2 VL, and LLaVA in some multimodal capabilities. However, larger closed multimodal models such as Claude 3.5 Sonnet or GPT-4 still show superior image comprehension.

In certain tests, Pixtral-12B shows significant improvements over similar models. For important text comprehension benchmarks like MMLU and HumanEval, it trails slightly behind Anthropic's smallest model, Claude 3 Haiku. Smaller but vision-only models like Qwen2-VL 7B and Phi 3.5 Vision also surpass Pixtral-12B in some vision benchmarks.

Mistral reports that Pixtral-12B can perform optical character recognition (OCR), analyze diagrams and schemas, and process screenshots. The model also shows promise with satellite imagery. While not yet tested on video frames, its 128,000-token context window suggests potential in this area.

Availability and licensing

Mistral has released Pixtral-12B under an Apache 2.0 license for free use. The model is currently available on GitHub and Hugging Face. Sophia Yang, Head of Developer Relations at Mistral, announced that Pixtral-12B will soon be available for testing on Mistral's platforms Le Chat and La Plateforme.

The release follows Mistral's recent $600 million funding round, valuing the company at $6 billion. Microsoft holds a minority stake in Mistral, while Amazon and AWS have also formed partnerships with the company.

As with previous releases, Mistral quickly made Pixtral-12B publicly available. However, the company has not provided information about the training data or independent performance evaluations beyond their own benchmarks.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

French AI company Mistral releases its first multimodal model Pixtral-12B

Benchmark performance

Availability and licensing

Mistral integrates MCP support and memory in Le Chat

Luxembourg partners with Mistral AI to bring artificial intelligence to government and defense

Mistral AI launches Mistral Compute to deliver private AI infrastructure for European institutions

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

French AI company Mistral releases its first multimodal model Pixtral-12B

Benchmark performance

Availability and licensing

Mistral integrates MCP support and memory in Le Chat

Luxembourg partners with Mistral AI to bring artificial intelligence to government and defense

Mistral AI launches Mistral Compute to deliver private AI infrastructure for European institutions