Stability AI releases a compact open text-to-audio model that runs on mobile devices

May 18, 2025

GPT-4o prompted by THE DECODER

Stability AI and Arm have released a compact text-to-audio model that runs on smartphones, capable of generating stereo audio clips up to 11 seconds long in about 7 seconds.

Called Stable Audio Open Small, the model is based on a technique known as "Adversarial Relativistic-Contrastive" (ARC), developed by researchers at the University of California, Berkeley and others. On high-end hardware like an Nvidia H100 GPU, it can produce 44 kHz stereo audio in just 75 milliseconds—fast enough for near real-time generation

The original version of Stable Audio Open launched last year as a free, open-source model with 1.1 billion parameters. This smaller version uses just 341 million parameters, making it significantly easier to run on consumer hardware. Stability AI and Arm first announced their collaboration in March.

Designed for mobile hardware

To make the model work on smartphones, the team overhauled the architecture. The system now consists of three components: an autoencoder that compresses the audio data, an embedding module that interprets text prompts, and a diffusion model that generates the final audio.

This redesigned setup doesn't rely on distillation, but still cuts memory usage nearly in half—from 6.5 GB down to 3.6 GB. That reduction makes it possible to run the model on mobile devices for the first time. During testing, researchers used the Vivo X200 Pro, an Android phone with 12 GB of RAM and a Mediatek Dimensity 9400 chip, released in late 2024.

Best suited for sound effects

Stability AI says the model is especially good at generating sound effects and field recordings. It still struggles with music, particularly with singing voices, and works best with English-language prompts.

The model was trained on roughly 472,000 clips from the Freesound database, using only material licensed under CC0, CC-BY, or CC-Sampling+ terms. To avoid copyright issues, the team filtered the data using a series of automated checks.

The software is available under the Stability AI Community License for open-source use. Commercial applications are subject to separate terms. The code is on GitHub, and model weights can be accessed via Hugging Face.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Stability AI releases a compact open text-to-audio model that runs on mobile devices

Designed for mobile hardware

Best suited for sound effects

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.