Content
summary Summary
Update
  • Added Mixtral 8x7B paper

Update from January 9, 2024:

Mistral AI has published the Mixtral 8x7B paper, which describes the architecture of the model in detail. It also contains extensive benchmarks comparing it to LLaMA 2 70B and GPT-3.5.

In the much-cited language comprehension benchmark MMLU, Mixtral is ahead of the two models mentioned above. Larger models such as Gemini Ultra or GPT-4 achieve between 85 and 90 percent, depending on the prompt method.

Image: Mistral AI

On the LMSys Leaderboard, where humans rate the AI's answers, Mixtral 8x7b is just ahead of Claude 2.1 and GPT-3.5, as well as Google's Gemini Pro. GPT-4 is clearly in the lead.

Ad
Ad

This confirms the pattern of the past few months: it seems relatively easy for many organizations to achieve a model at or slightly above the level of GPT-3.5. But GPT-4 remains unmatched.

Original article from December 11, 2023:

Mixtral 8x7B is currently the best open source LLM, says Mistral

Mistral AI has released its new Mixtral 8x7B language model, detailing its performance in a new blog post. It is claimed to be the best open language model currently available.

At the end of last week, Mistral released a new language model via a torrent link. Today, the company has released more details about the Mixtral 8x7B model, as well as announcing an API service and new funding.

According to the company, Mixtral is a sparse Mixture-of-Experts (SMoE) model with open weights, licensed under Apache 2.0. A similar architecture is rumored to be used by OpenAI for GPT-4. Mixtral selects two of the eight parameter sets for a query and uses only a fraction of the total number of parameters per inference, reducing cost and latency. Specifically, Mixtral has 45 billion parameters but uses only 12 billion parameters per token for inference. It is the largest model to date from the start-up, which released the relatively powerful Mistral 7B in September.

Recommendation

Mixtral 8x7B outperforms Meta's LLaMA 2 70B

According to Mistral, Mixtral outperforms Llama 2 70B in most benchmarks and offers 6 times faster inference. It is also said to be more truthful and less biased than the Meta model. According to Mistral, this makes it the "strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs." In standard benchmarks, it also matches or outperforms OpenAIs GPT-3.5.

Mixtral handles up to 32,000 token contexts, supports English, French, Italian, German, and Spanish, and can write code.

Mistral releases Instruct version of Mixtral

In addition to the base Mixtral 8x7B model, Mistral is also launching Mixtral 8x7B Instruct. The model has been optimized for precise instruction through supervised fine-tuning and Direct Preference Optimisation (DPO). It achieves a score of 8.30 in MT-Bench, making it the best open-source model with performance comparable to GPT-3.5.

Mixtral is available in beta on the Mistral platform. The smaller Mistral 7B and a more powerful prototype model that outperforms GPT-3.5 are also available there, Mistral said.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Mistral AI released its new language model, Mixtral 8x7B, which is said to be the best open language model currently available, outperforming Meta's LLaMA 2 70B in benchmarks and said to be on par with GPT-3.5.
  • Mixtral 8x7B is a sparse Mixture-of-Experts (SMoE) model with 45 billion parameters but uses only 12 billion parameters per token for inference to reduce cost and latency.
  • The model supports English, French, Italian, German, and Spanish, and Mistral has also released an Instruct variant that is optimized for following instructions accurately.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.