Google MusicLM turns language into music

Google introduces MusicLM, a generative text-to-music model. It can generate multi-minute tracks from text prompts.

While generative AI models for images have already reached the visual quality of human artists, models for audio and music still lag far behind. A "DALL-E for music" is difficult to realize. There are approaches like Meta's AudioGen, Riffusion or Google's AudioLM, but no convincing generative music model yet.

In addition to the complicated copyright situation for music, the temporal dimension is a major challenge: images are static, music changes. Depending on the culture, these changes follow certain rules - but can also be broken.

Google's MusicLM generates several minutes of music that sounds decent

AudioLM is a generative AI model for language, audio, and music. AudioLM uses techniques from large-scale language models: A BERT model specialized for audio (w2v-BERT) constructs semantic tokens from audio waveforms that can capture, for example, the phonetics of language or local melodies, harmonies, or rhythms. An encoder called SoundStream captures the finer details of audio waveforms in acoustic tokens and is responsible for high-quality audio synthesis.

Now Google is introducing MusicLM, a generative AI system that combines AudioLM with another model. This third component is called MuLan, and was trained by Google using pairs of 10-second audio snippets and matching text descriptions created by ten professional musicians. The MusicCaps training dataset of 5,500 music clips and text descriptions was published by Google.

After training, MusicLM predicts acoustic tokens, given both MuLan audio tokens and w2v-BERTs semantic tokens. These are then converted to audio by SoundStream. Using this method, Google can generate several minutes of music.

MusicLM can be controlled with melodies

The results range from a slow reggae song to an arcade game soundtrack, from relaxing jazz to Gregorian chants. MusicLM can be controlled with a short phrase or with detailed descriptions.

Prompt

The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.

Recommendation

AI research

DOOM on the toaster was fun, on AI it's groundbreaking

MusicLM Output

Prompt

We can hear a choir, singing a Gregorian chant, and a drum machine, creating a rhythmic beat. The slow, stately sounds of strings provide a calming backdrop for the fast, complex sounds of futuristic electronic music.

MusicLM Output

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

MusicLM can also process a combination of melody and lyrics, such as converting the melody of an acoustic guitar piece to synth.

Prompt (Fingerstyle Guitar Melody)

MusicLM Output (electronic synth lead)

MusicLM still has problems with vocals, negations in prompts, and temporal sequences. The team plans to address these issues in the future, and also plans to improve the quality of the generated audio.

More information and examples can be found on the MusicLM project page. According to the paper, there are currently no plans to release the model.

Google MusicLM turns language into music

Google's MusicLM generates several minutes of music that sounds decent

MusicLM can be controlled with melodies

DOOM on the toaster was fun, on AI it's groundbreaking

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

Studio Ghibli founder Hayao Miyazaki's viral AI criticism lacks crucial context

Google adds native image generation to Gemini language models

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

Google MusicLM turns language into music

Google's MusicLM generates several minutes of music that sounds decent

MusicLM can be controlled with melodies

Share

Bank details