Content
summary Summary

Just a few months after the last release, Meta has unveiled the next addition to the Llama series. Following the large 405B model of Llama 3.1, version 3.2 introduces two tiny models for smartphones and two larger models capable of understanding images.

Ad

Meta's latest language model output initially includes two text models with one and three billion parameters, respectively, which can run on smartphones and are designed to summarize texts, rewrite content, or invoke specific functions in other apps.

Video: Meta

To achieve this, Meta worked closely with major hardware manufacturers such as Qualcomm, MediaTek, and Arm. According to Meta, local processing primarily offers advantages in terms of speed and data protection.

Ad
Ad

To optimize the lightweight 1B and 3B models, Meta employed a combination of pruning and knowledge transfer through larger teacher models. Structured pruning in a single pass from the previous Llama 3.1-8B model systematically removed parts of the network and adjusted the weights.

Vision models with 11 and 90 billion parameters

In addition to the lightweight models, Meta is releasing its first vision models with 11 and 90 billion parameters. According to Meta's benchmarks, Llama 3.2 11B and 90B can keep pace with leading closed models such as Claude 3 Haiku and GPT-4o mini in image understanding tasks. Open source competitor Mistral also recently unveiled its first vision model, Pixtral, which has a significantly smaller number of parameters.

Video: Meta

To enable image input, Meta has equipped the Llama 3.2 vision models with a new type of architecture. This involves training additional adapter weights that integrate the pre-trained image encoder into the pre-trained language model.

Unlike other open multimodal models, the Llama 3.2 vision models are available in both pre-trained and aligned versions for fine-tuning and local deployment. In terms of performance, they are on par with recently released models such as Mistral's Pixtral or Qwen 2 VL.

Recommendation
Image: Meta

Llama stack API to simplify RAG and more

To simplify development with Llama models, Meta is introducing the first official Llama stack distributions. These enable the turnkey provision of applications with Retrieval Augmented Generation (RAG) and tool integration in various environments.

For the API, Meta is collaborating with AWS, Databricks, Dell, and Together AI, among others. There is also a command line interface (CLI) and code for various programming languages.

Meta lacks the home advantage on mobile

While the release of Llama 3.2 represents another step in Meta's efforts to make open source AI - or it's interpretation of it - the standard, it remains to be seen whether it will gain traction on smartphones, as Android with Gemini Nano and iOS with Apple Intelligence have their own deeply integrated solutions for local AI processing.

With the vision upgrade for the Llama models, Meta has given its AI assistant, Meta AI, an important function that will benefit many users on the company's numerous social media platforms. In the long term, this could impact competitors like ChatGPT, which received similar capabilities around a year ago.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

The Llama 3.2 models are available for download at llama.com and Hugging Face, as well as through a broad ecosystem of partner platforms.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta has released Llama 3.2, a series of open source AI models for edge devices and vision applications. The 1B and 3B text models are designed to run on smartphones, where they can summarize or paraphrase texts, for example.
  • Meta is also releasing 11B and 90B vision models that can keep up with similarly sized, closed models for image understanding tasks. A new architecture with additional adapter weights enables the input of images.
  • To simplify development with Llama models, Meta is introducing the first official Llama stack distributions, an API for turnkey applications with retrieval augmented generation and tool connectivity. It remains to be seen whether the models will prevail over system-integrated mobile solutions.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.