Meta's new Llama 3.2 brings tiny models to mobile devices and adds image understanding

Sep 26, 2024

Key Points

Meta has released Llama 3.2, a series of open source AI models for edge devices and vision applications. The 1B and 3B text models are designed to run on smartphones, where they can summarize or paraphrase texts, for example.
Meta is also releasing 11B and 90B vision models that can keep up with similarly sized, closed models for image understanding tasks. A new architecture with additional adapter weights enables the input of images.
To simplify development with Llama models, Meta is introducing the first official Llama stack distributions, an API for turnkey applications with retrieval augmented generation and tool connectivity. It remains to be seen whether the models will prevail over system-integrated mobile solutions.

Just a few months after the last release, Meta has unveiled the next addition to the Llama series. Following the large 405B model of Llama 3.1, version 3.2 introduces two tiny models for smartphones and two larger models capable of understanding images.

Meta's latest language model output initially includes two text models with one and three billion parameters, respectively, which can run on smartphones and are designed to summarize texts, rewrite content, or invoke specific functions in other apps.

Video: Meta

To achieve this, Meta worked closely with major hardware manufacturers such as Qualcomm, MediaTek, and Arm. According to Meta, local processing primarily offers advantages in terms of speed and data protection.

To optimize the lightweight 1B and 3B models, Meta employed a combination of pruning and knowledge transfer through larger teacher models. Structured pruning in a single pass from the previous Llama 3.1-8B model systematically removed parts of the network and adjusted the weights.

Vision models with 11 and 90 billion parameters

In addition to the lightweight models, Meta is releasing its first vision models with 11 and 90 billion parameters. According to Meta's benchmarks, Llama 3.2 11B and 90B can keep pace with leading closed models such as Claude 3 Haiku and GPT-4o mini in image understanding tasks. Open source competitor Mistral also recently unveiled its first vision model, Pixtral, which has a significantly smaller number of parameters.

Video: Meta

To enable image input, Meta has equipped the Llama 3.2 vision models with a new type of architecture. This involves training additional adapter weights that integrate the pre-trained image encoder into the pre-trained language model.

Unlike other open multimodal models, the Llama 3.2 vision models are available in both pre-trained and aligned versions for fine-tuning and local deployment. In terms of performance, they are on par with recently released models such as Mistral's Pixtral or Qwen 2 VL.

Llama stack API to simplify RAG and more

To simplify development with Llama models, Meta is introducing the first official Llama stack distributions. These enable the turnkey provision of applications with Retrieval Augmented Generation (RAG) and tool integration in various environments.

For the API, Meta is collaborating with AWS, Databricks, Dell, and Together AI, among others. There is also a command line interface (CLI) and code for various programming languages.

Meta lacks the home advantage on mobile

While the release of Llama 3.2 represents another step in Meta's efforts to make open source AI - or it's interpretation of it - the standard, it remains to be seen whether it will gain traction on smartphones, as Android with Gemini Nano and iOS with Apple Intelligence have their own deeply integrated solutions for local AI processing.

With the vision upgrade for the Llama models, Meta has given its AI assistant, Meta AI, an important function that will benefit many users on the company's numerous social media platforms. In the long term, this could impact competitors like ChatGPT, which received similar capabilities around a year ago.

The Llama 3.2 models are available for download at llama.com and Hugging Face, as well as through a broad ecosystem of partner platforms.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Meta