Stability AI and Arm have optimized the Stable Audio Open model to run on phone processors, enabling offline audio generation directly on mobile devices.
Stable Audio Open, released in summer 2024, generates up to 47 seconds of audio from text prompts. The model specializes in short-form audio like drum beats, instrumental riffs, ambient sounds and Foley recordings. Unlike the commercial Stable Audio 2, it isn't designed for creating complete songs like services such as Suno.
The initial version of Stable Audio Open took 240 seconds to generate audio on Arm CPUs. Through model distillation and Arm's software stack, generation time dropped to under 8 seconds for an 11-second clip on Armv9 processors - a 30x speed improvement.
The implementation uses Arm's KleidiAI libraries to process audio generation tasks on device processors without requiring an internet connection. Stability AI's blog post doesn't detail the technical specifics, and no research paper has been published yet. The optimization makes the model accessible to anyone with a compatible ARM-based mobile device.
Stability AI intends to port its image, video and 3D generation models to mobile devices using the Arm partnership. This focus on mobile development differs from the company's previous strategy of frequent Stable Diffusion image model releases. The London-based startup appointed a new CEO in June 2024 amid financial difficulties and staff departures.