French AI company Mistral releases its first multimodal model Pixtral-12B
Mistral AI
Key Points
- French AI startup Mistral has unveiled its first multimodal model, Pixtral-12B, which can process both images and text. With 12 billion parameters, it is based on Mistral's NeMo-12B text model.
- In benchmarks, Pixtral-12B partially outperforms other open-source vision models such as Phi 3, Qwen2 VL, and LLaVA, but lags behind closed, larger models such as Claude 3.5 Sonnet or GPT-4o. Among other things, it is capable of OCR, diagram analysis and screenshot processing.
- Mistral has released Pixtral-12B under an Apache 2.0 license and plans to test it soon on its own platforms Le Chat and La Plateforme. Details on the training data are not known, and the real performance will have to be proven on real tasks outside of benchmarks.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now
Source: swyx | TechCrunch


