Run LLMs on your M Series with Apple's new MLX machine learning framework

Midjourney prompted by THE DECODER

Update

You can now run Mistral's new Mixture-of-Experts model in MLX on Apple silicon.

Apple has released MLX, an efficient machine learning framework tailored for Apple silicon, and MLX Data, a flexible data loading package.

Both have been released by Apple's machine learning research team. MLX's Python API closely follows NumPy, with a few differences.

Composable function transformations: MLX has composable function transformations for automatic differentiation, automatic vectorization, and computation graph optimization.

Lazy computation: Computations in MLX are lazy. Arrays are only materialized when needed.

Multi-device: Operations can run on any of the supported devices (CPU, GPU, …)

The design of MLX is inspired by frameworks such as PyTorch, Jax, and ArrayFire. A notable difference between these frameworks and MLX is the unified memory model, Apple writes. Arrays in MLX live in shared memory, allowing operations on MLX arrays to be performed on any supported device type without performing data copies. MLX Data (Github) is a framework-agnostic and flexible data-loading package.

Run Mistral and Llama on your M2 Ultra

With MLX and MLX Data, users can perform tasks such as training a Transformer language model or fine-tuning with LoRA, text generation with Mistral, image generation with Stable Diffusion, and speech recognition with Whisper. For an example of how to get started with MLX and Mistral, see this tutorial.

The following video shows the performance of a Llama v1 7B model implemented in MLX and running on an M2 Ultra, highlighting the capabilities of MLX on Apple Silicon devices.

Video: Awni Hannun via Twitter.com

For details, see the MLX Github and Apple's documentation.

So far, Apple has mostly talked publicly about "machine learning" and how it's implementing ML features in its products, such as better word prediction for its iPhone keyboard.

Apple's move now with MLX is interesting in that it potentially strengthens the open-source AI movement built around models like Meta's Llama, Mistral, and Stable Diffusion.

Recommendation

AI in practice

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

But it's also reportedly working internally on an LLM framework called Ajax and its own chatbot, and is spending millions of dollars a day on AI training to keep up with ChatGPT and generative AI services in general.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Run LLMs on your M Series with Apple's new MLX machine learning framework

Run Mistral and Llama on your M2 Ultra

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

Apple releases open source LLMs that fit perfectly into its AI strategy

AI coding can make developers slower even if they feel faster

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

"Cat attack" on reasoning model shows how important context engineering is

Run LLMs on your M Series with Apple's new MLX machine learning framework

Run Mistral and Llama on your M2 Ultra

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

Apple releases open source LLMs that fit perfectly into its AI strategy