- You can now run Mistral's new Mixture-of-Experts model in MLX on Apple silicon.
Apple has released MLX, an efficient machine learning framework tailored for Apple silicon, and MLX Data, a flexible data loading package.
Both have been released by Apple's machine learning research team. MLX's Python API closely follows NumPy, with a few differences.
- Composable function transformations: MLX has composable function transformations for automatic differentiation, automatic vectorization, and computation graph optimization.
- Lazy computation: Computations in MLX are lazy. Arrays are only materialized when needed.
- Multi-device: Operations can run on any of the supported devices (CPU, GPU, …)
The design of MLX is inspired by frameworks such as PyTorch, Jax, and ArrayFire. A notable difference between these frameworks and MLX is the unified memory model, Apple writes. Arrays in MLX live in shared memory, allowing operations on MLX arrays to be performed on any supported device type without performing data copies. MLX Data (Github) is a framework-agnostic and flexible data-loading package.
Run Mistral and Llama on your M2 Ultra
With MLX and MLX Data, users can perform tasks such as training a Transformer language model or fine-tuning with LoRA, text generation with Mistral, image generation with Stable Diffusion, and speech recognition with Whisper. For an example of how to get started with MLX and Mistral, see this tutorial.
The following video shows the performance of a Llama v1 7B model implemented in MLX and running on an M2 Ultra, highlighting the capabilities of MLX on Apple Silicon devices.
For details, see the MLX Github and Apple's documentation.
So far, Apple has mostly talked publicly about "machine learning" and how it's implementing ML features in its products, such as better word prediction for its iPhone keyboard.
Apple's move now with MLX is interesting in that it potentially strengthens the open-source AI movement built around models like Meta's Llama, Mistral, and Stable Diffusion.
But it's also reportedly working internally on an LLM framework called Ajax and its own chatbot, and is spending millions of dollars a day on AI training to keep up with ChatGPT and generative AI services in general.