Content
summary Summary

Researchers at Queen Mary University of London, Sony AI, and MBZUAI's Music X Lab have developed an AI system called Instruct-MusicGen that can modify existing music based on text prompts.

Ad

Instruct-MusicGen builds on Meta's open-source AI model MusicGen, which the team has enhanced for text-to-music editing tasks. The researchers modified the original MusicGen architecture by adding text and audio fusion modules, allowing the model to process editing prompts and audio input simultaneously.

Instruct-MusicGen takes edit prompts and source music simultaneously as input and applies instructions to the source. | Image: Zhang et al.

The added audio and text fusion modules enable precise editing tasks like adding, removing, or separating music tracks, known as stems. Stems are grouped tracks, often organized by instrument type, that play a key role in music production.

Input audio without bass:

Ad
Ad

With instruction "add bass":

Input Audio:

Input Audio "drums only":

The researchers note that Instruct-MusicGen improves the efficiency of text-to-music processing and expands the use of language models for music in production environments.

The new model requires only 8% more parameters and 5,000 additional training steps, less than 1% of MusicGen's total training time, to achieve good results. The developers provide numerous examples, code, model, and weights on the project page.

Recommendation

Sony should be in the clear regarding licensing, as Meta asserts that MusicGen was only trained on licensed music and the research team used a dataset of synthetically generated music pieces, Slakh210, for its own instruction tuning. This is significant because Sony is a key player in a lawsuit claiming license infringement against current music generators capable of producing completely original music compositions based on text prompts.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers from Queen Mary University of London, Sony AI, and the Music X Lab at MBZUAI have developed an AI system called Instruct-MusicGen that can modify existing music based on text instructions.
  • Instruct-MusicGen is based on the open-source AI model MusicGen and has been optimized for precise editing tasks such as adding, removing, or separating music tracks.
  • The new model requires only eight percent more parameters and 5,000 additional training steps to achieve good results. The code, model, and weights are freely available.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.