Content
summary Summary

Stable Audio 2.5 is the latest audio generation model from Stability AI, this time built for professional sound production. The company says the model is meant to help creative teams generate high-quality, customizable audio at scale.

Ad

Stable Audio 2.5 can create more complex musical structures, including multi-part pieces with intros, developments, and outros. According to Stability AI, the model now responds more accurately to mood prompts like "uplifting" and understands genre-specific cues such as "lush synthesizers."

Music tracks up to three minutes long take just a few seconds to generate, with processing times under two seconds on Nvidia H100 GPUs.

The model's speed comes from a post-training method called Adversarial Relativistic-Contrastive (ARC), developed by the company's research team. In May, Stability AI also released a compact version for smartphones using the same ARC method. The Stable Audio Open Small model can generate stereo audio up to eleven seconds long in about seven seconds on mobile devices.

Ad
Ad

Audio inpainting and editing

The main update in Stable Audio 2.5 is audio inpainting. Users can upload their own audio files, choose a starting point, and let the AI generate the rest of the track, extending or completing existing recordings. The new version can also generate music from text prompts.

Uploaded files must be copyright-free. Stability AI says it uses recognition systems to enforce copyright rules. Like earlier versions, Stable Audio 2.5 was trained on a licensed dataset and is considered commercially safe, according to the company.

Stability AI points to a range of possible uses, from commercials and game intros to department store music and custom sounds for credit cards or car stereos. The idea is to give companies a way to keep a consistent audio identity wherever customers interact with the brand. Stability's audio team can also adapt models to fit a company's own sound library, building in distinctive audio cues.

The company is working with Amp, a sound branding agency owned by WPP, to build audio tools for large clients. Stable Audio 2.5 will be available to WPP's global customers through the WPP Open platform.

Stability AI rolled out Stable Audio 2 in April 2024, which already supported three-minute music generation along with audio-to-audio and style transfer features. After making its name in generative AI for images, Stability AI has shifted focus to audio and started expanding its partner network, likely to shore up its finances. In March, WPP Group invested an undisclosed sum in the company. Meta has also started to ramp up its own audio research.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Stability AI has launched Stable Audio 2.5, a new audio generation model designed for professional and enterprise use, offering faster production of customizable, high-quality music tracks up to three minutes long.
  • The model introduces audio inpainting, allowing users to upload audio files and have the AI seamlessly extend or complete recordings, in addition to generating music from text prompts; all files must be copyright-free, with compliance enforced by advanced recognition systems.
  • Stability AI is targeting use cases in advertising, retail, and branding, partnering with agencies like WPP's Amp to deliver consistent audio identities for large clients, as the company pivots from image-based AI to audio technology and expands its commercial partnerships.
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.