Content
summary Summary

Microsoft has released three new open-source AI models in its Phi series: mini-instruct, MoE-instruct, and vision-instruct. These models excel at LLM reasoning and support multiple languages, but have limitations in factual knowledge and safety.

Ad

Designed for commercial and scientific use, the Phi series generally aims to create highly efficient AI models using high-quality training data, although Microsoft hasn't yet shared details about the training process for Phi-3.5.

For the vision model, the company says it used "newly created synthetic, 'textbook-like' data for the purpose of teaching math, coding, common-sense reasoning, general knowledge of the world," in addition to other high-quality and filtered data.

Microsoft says these new models are ideal for applications with limited resources, time-sensitive scenarios, and tasks requiring strong logical reasoning within an LLM's capabilities.

Ad
Ad

The Phi-3.5-mini-instruct model, with 3.8 billion parameters, is optimized for low-resource environments. Despite its small size, it performs well in benchmarks, especially for multilingual tasks.

The Phi 3.5 MoE-instruct model has 16 experts, each with 3.8 billion parameters, for a total of 60.8 billion. However, only 6.6 billion parameters are active when using two experts, which is enough to match larger models in language comprehension and math, and to outperform some in reasoning tasks.

Image: Microsoft

It's often close to GPT-4o-mini performance, but keep in mind that these are just benchmarks, and word on the street is that Phi models have shown subpar real-world performance.

The Phi-3.5-vision-instruct model, a multimodal system with 4.2 billion parameters, can process text and images. It's suitable for tasks such as image understanding, OCR, and diagram understanding. It outperforms similarly sized models in benchmarks, and competes with larger models in multi-image processing and video summarization.

Image: Microsoft

Phi's context window gets an upgrade

All Phi 3.5 models support a context length of up to 128,000 tokens, making it useful for long document summaries and multilingual context retrieval. It outperforms Google's Gemma 2 models, which are limited to 8,000 tokens.

Recommendation

However, like all LLMs, it is likely to suffer from the "lost in the middle" problem when processing large documents. This also applies to image processing.

The small size of the models limits their factual knowledge, according to Microsoft, potentially leading to higher than average inaccuracies. Microsoft suggests pairing Phi-3.5 with a search method such as RAG to address this weakness.

Like other language models, Phi models can produce biased or offensive output. They reject unwanted content in English, even when prompted in other languages, but are more vulnerable to complex prompt injection techniques in multiple languages.

The Phi 3.5 models are available under the MIT license on Hugging Face and through Microsoft's Azure AI Studio. They require specialized GPU hardware like NVIDIA A100, A6000, or H100 to support flash attention.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Microsoft has introduced three new open source AI models in its Phi 3.5 series: mini-instruct, MoE-instruct, and vision-instruct. The 3.5 models are designed for commercial and scientific use in multiple languages and have relatively high reasoning capabilities with the typical limitations of LLMs.
  • The smallest model, Phi-3.5-mini-instruct, has 3.8 billion parameters and is optimized for scenarios with limited computing power. The MoE-instruct model has 60.8 billion parameters, of which only 6.6 billion are active. The vision-instruct model can process text and images at GPT-4o level.
  • Due to their small size, the models have weaknesses in terms of factual knowledge and safety. Microsoft recommends combining them with a search system such as RAG to compensate for inaccuracies.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.