AI music editor developed by Sony and researchers can modify songs with text prompts
Key Points
- Researchers from Queen Mary University of London, Sony AI, and the Music X Lab at MBZUAI have developed an AI system called Instruct-MusicGen that can modify existing music based on text instructions.
- Instruct-MusicGen is based on the open-source AI model MusicGen and has been optimized for precise editing tasks such as adding, removing, or separating music tracks.
- The new model requires only eight percent more parameters and 5,000 additional training steps to achieve good results. The code, model, and weights are freely available.
Researchers at Queen Mary University of London, Sony AI, and MBZUAI's Music X Lab have developed an AI system called Instruct-MusicGen that can modify existing music based on text prompts.
Instruct-MusicGen builds on Meta's open-source AI model MusicGen, which the team has enhanced for text-to-music editing tasks. The researchers modified the original MusicGen architecture by adding text and audio fusion modules, allowing the model to process editing prompts and audio input simultaneously.

The added audio and text fusion modules enable precise editing tasks like adding, removing, or separating music tracks, known as stems. Stems are grouped tracks, often organized by instrument type, that play a key role in music production.
Input audio without bass:
With instruction "add bass":
Input Audio:
Input Audio "drums only":
The researchers note that Instruct-MusicGen improves the efficiency of text-to-music processing and expands the use of language models for music in production environments.
The new model requires only 8% more parameters and 5,000 additional training steps, less than 1% of MusicGen's total training time, to achieve good results. The developers provide numerous examples, code, model, and weights on the project page.
Sony should be in the clear regarding licensing, as Meta asserts that MusicGen was only trained on licensed music and the research team used a dataset of synthetically generated music pieces, Slakh210, for its own instruction tuning. This is significant because Sony is a key player in a lawsuit claiming license infringement against current music generators capable of producing completely original music compositions based on text prompts.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now