Content
summary Summary

Mistral AI this week introduced three new LLMs that set new benchmarks for general-purpose tasks and specialized areas such as mathematical reasoning and code generation.

Ad

Mathstral: 7 Billion Parameters for math tasks

Mathstral, a 7-billion-parameter model, was developed by the French startup in collaboration with Project Numina, a non-profit organization focused on advancing human and artificial intelligence in mathematics. It outperforms similarly sized models on mathematical benchmarks such as MATH (56.6%) and general benchmarks such as MMLU (63.47%).

Image: Mistral

The Mistral team pointed to Mathstral as an example of their design philosophy of balancing performance and speed in purpose-built models.

Codestral Mamba: New architecture with larger context window

Following the May 2024 release of Codestral, a 22-billion-parameter code model, Mistral has now introduced Codestral Mamba. This new model uses the Mamba2 architecture, offering fast code generation with context windows up to 256,000 tokens.

Ad
Ad

This makes it ideal as a local code assistant, as you can feed your own codebase and documentation from programming frameworks into a single prompt. Codestral Mamba processes sequences in linear time, enabling quicker responses and theoretically infinite output lengths.

It often outperformed similarly sized models in benchmarks, though the larger Transformer-based Codestral still leads in most areas.

Image: Mistral

Mistral hasn't yet released technical documentation that provides more insight into the training data and model architectures. However, the weights are available on Hugging Face (Mathstral, Codestral Mamba).

Mistral NeMo: A replacement for the Mistral-7B?

Developed with NVIDIA, Mistral NeMo features 12 billion parameters and a context window of up to 128,000 tokens. It excels in logic, world knowledge, and coding capabilities, making it suitable for global, multilingual applications.

Mistral NeMo is based on a standard architecture, allowing for easy integration into existing systems. Compared to other open-source models like Gemma-2-9B and LLaMA-3-8B, the NeMo base model shows similar or better results in benchmarks while supporting a context window 16 times larger.

Recommendation
Image: Mistral

The model was trained with a new tokenizer called Tekken, optimized for over 100 languages, enabling greater compression of natural text and source code than the previously used SentencePiece tokenizer.

Compared to the LLaMA 3 tokenizer, Tekken offers more efficient compression for 85 percent of languages. Mistral NeMo is particularly powerful in languages like English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic and Hindi.

Mistral has released the pre-trained base and instruction-optimized checkpoints under the Apache 2.0 license to encourage adoption by researchers and companies. NeMo seems to follow in the footsteps of Mistral-7B, which received an update in May.

Mistral remains Europe's top LLM startup

Mistral AI continues to lead as Europe's top LLM startup. Earlier this year, the company launched Mistral Large to compete with OpenAI's GPT-4. The company secured a multi-year partnership with Microsoft in February and raised $600 million in June.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Overall, Mistral AI is positioning itself as a leading European AI company, developing capable and specialized AI models that also take into account aspects such as transparency and data protection in line with European standards.

In Europe's LLM sector, there is also Aleph Alpha, which hasn't yet caught up with Mistral at the model level, DeepL, which wants to expand AI language solutions for enterprises with $300 million in backing, and Silo AI, which was recently acquired by AMD.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • French AI start-up Mistral AI has released two specialized language models and one general language model: Mathstral with 7 billion parameters for mathematical reasoning, Codestral Mamba with the new Mamba2 architecture, and Mistral NeMo with 12 billion parameters.
  • Mathstral achieves top performance on mathematical benchmarks such as MATH (56.6%) and more general benchmarks such as MMLU (63.47%), outperforming models of similar size. Codestral Mamba enables the integration of large code bases and documentation with context windows of up to 256,000 tokens.
  • With partnerships such as Microsoft and a recent $600 million funding round, Mistral AI is positioning itself as one of Europe's leading AI companies with a focus on high-performance, domain-specific and general purpose LLMs.
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.