Content
summary Summary

The Mistral AI team releases Mistral 7B, a 7.3 billion parameter language model that outperforms larger Llama models on benchmarks. The model can be used without restrictions under the Apache 2.0 license.

Mistral 7B outperforms the larger Llama 2 13B on all benchmarks measured and Llama 1 34B on many benchmarks, the Mistral team claims. In addition, Mistral 7B approaches the programming performance of CodeLlama 7B and still performs well in English language tasks.

Mistral 7B can be downloaded for free and deployed anywhere using the reference implementation, in any cloud (AWS/GCP/Azure) using vLLM Inference Server and Skypilot, or via HuggingFace. According to Mistral AI, the model can be easily adapted to new tasks such as chat or instructions through fine-tuning.

Mistral AI compares Mistral 7B to Llama 2 models 7B and 13B in multiple domains, including reasoning, world knowledge, reading comprehension, math and code.

Ad
Ad

Image: MistralAccording to Mistral AI, Mistral 7B is on par with a theoretical Llama 2 model that is more than three times larger, but saves memory and increases data throughput. Mistral attributes the fact that it trails Llama 1 34B in knowledge questions to its lower parameters.

Transformer architecture optimizations

Mistral achieves greater efficiency through Grouped Query Attention (GQA), which can handle multiple queries simultaneously, increasing computational efficiency in Transformer models while maintaining high model performance.

The Sliding Window Attention (SWA) mechanism focuses on a specific size of context window within a sequence. The goal is to achieve a balance between computational cost and model quality. According to Mistral, this doubles the speed for sequence lengths of 16k with a context window of 4k.

Sliding Windows Attention | Image: Mistral AI

To demonstrate its versatility, Mistral AI adapted Mistral 7B to HuggingFace instruction datasets, resulting in the Mistral 7B Instruct model. It outperforms all 7B models on MT-Bench and competes with 13B chat models.

Mistral AI to follow suit

French startup Mistral AI made waves in June when it announced the largest European seed round at $105 million - without having a product. The team consists of former Meta and Google Deepmind employees. One of its high-profile investors is former Google CEO Eric Schmidt.

Recommendation

Their business model is to distribute powerful open-source models with specific paid features for customers willing to pay. According to a leaked pitch letter, top-of-the-line models could be paid for.

The letter also reveals that Mistral plans to release a "family of text generation models" by the end of 2023 that will "significantly outperform" ChatGPT with GPT-3.5 and Google Bard. Part of this family of models will be open-source. So Mistral 7B should be just the beginning.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Mistral AI, a French AI startup, releases Mistral 7B, a high-performance 7.3 billion parameter language model that outperforms well-known models such as Llama in benchmarks and is freely available under the Apache 2.0 license.
  • Through optimizations such as Grouped Query Attention (GQA) and Sliding Window Attention (SWA), Mistral 7B increases computational and memory efficiency compared to larger models, and can be easily tuned for tasks such as chat.
  • The Mistral AI team, which includes former Meta and Google Deepmind employees, plans to release a "family of text generation models" more powerful than GPT-3.5 and Google Bard by the end of 2023. So Mistral 7B might just be the beginning.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.