AI music editor developed by Sony and researchers can modify songs with text prompts

Midjourney prompted by THE DECODER

Researchers at Queen Mary University of London, Sony AI, and MBZUAI's Music X Lab have developed an AI system called Instruct-MusicGen that can modify existing music based on text prompts.

Instruct-MusicGen builds on Meta's open-source AI model MusicGen, which the team has enhanced for text-to-music editing tasks. The researchers modified the original MusicGen architecture by adding text and audio fusion modules, allowing the model to process editing prompts and audio input simultaneously.

Instruct-MusicGen takes edit prompts and source music simultaneously as input and applies instructions to the source. | Image: Zhang et al.

The added audio and text fusion modules enable precise editing tasks like adding, removing, or separating music tracks, known as stems. Stems are grouped tracks, often organized by instrument type, that play a key role in music production.

Input audio without bass:

With instruction "add bass":

Input Audio:

Input Audio "drums only":

The researchers note that Instruct-MusicGen improves the efficiency of text-to-music processing and expands the use of language models for music in production environments.

The new model requires only 8% more parameters and 5,000 additional training steps, less than 1% of MusicGen's total training time, to achieve good results. The developers provide numerous examples, code, model, and weights on the project page.

Recommendation

AI in practice

OpenAI launches Sora video generator for ChatGPT subscribers

Sony should be in the clear regarding licensing, as Meta asserts that MusicGen was only trained on licensed music and the research team used a dataset of synthetically generated music pieces, Slakh210, for its own instruction tuning. This is significant because Sony is a key player in a lawsuit claiming license infringement against current music generators capable of producing completely original music compositions based on text prompts.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI music editor developed by Sony and researchers can modify songs with text prompts

OpenAI launches Sora video generator for ChatGPT subscribers

OpenAI is building an AI model that can generate music from text or audio promptsOpenAI is building an AI model that can generate music from text or audio prompts

Meta AI's Yann LeCun says he played only an indirect role in the development of Llama models

OpenAI positions ChatGPT as a search engine for work data with Company Knowledge

ChatGPT's memory could turn personal details into ads OpenAI CEO Altman once called dystopian

The long-predicted deepfake dystopia has arrived with Sora 2

Anthropic claims to lower the entry barrier for advanced AI models with Claude Haiku 4.5

AI music editor developed by Sony and researchers can modify songs with text prompts

Share

Bank details