Where is the "DALL-E for music"?

Enter a line of text and hear a piece of music after a few seconds? There are still some hurdles to clear before that happens, says one analyst.

First, it was AI-generated text, then images, which have become more sophisticated recently. HD Video and 3D AI generators are also in the works.

That rightly raises the question: Where is a similar service to GPT-3, Midjourney, or DALL-E for the music industry? Cherie Hu of Water and Music, a research and intelligence network for the new music industry, made some arguments in a Twitter thread as to why such a service is a long time coming.

Too little training data, too many copyrights

The first point she raises is the lack of training data. While each of the available text-to-image models has been trained with dozens of terabytes of data, there is not nearly as much public training data for music. To get to that point, Hu says, you'd have to train a model with all published music and also access the private drafts of DAWs like GarageBand, Ableton Live, or Logic.

As with image generators, copyright considerations also play a major role: It is true that millions of music tracks can be pirated from music streaming services and then used for training. But that would immediately bring the major labels and their lawyers onto the scene.

"Lawyers in the music industry have more power than in any other creative industry," Hu says. Some Artists and coders are already fighting generative AI that could infringe on copyrights.

Lack of expertise outside academic research

While breakthroughs are being made from the open-source community in image and text AIs, the music industry is still dominated by academia. "There's less data, so the work is just harder and slower. And the Nexus of people who know machine learning, music production, signal processing, etc., is tiny."

According to Wu, this also has to do with the fact that music is more difficult to sift through and, above all, to evaluate than visual art. "It literally takes time to listen to and evaluate a one-minute song. In that same time, you can scan hundreds of images."

Hu summarizes that the best AI models for music currently …

Recommendation

AI in practice

Nvidia positions GR00T N1 to dominate robotics ecosystem

require more specialist technical knowledge to run,
take longer to run,
are more expensive to run,
have only OK output,
and are harder to rally public excitement around.

When does generative AI for music have its Midjourney moment?

However, Hu draws a conclusion that shouldn't make the music industry breathe a sigh of relief: "This is all going to change very soon, given how quickly the creative AI landscape is evolving."

Early examples include startups like Mubert, which recently unveiled a text-to-music model, and Sony's AI division, which is researching neural synthesizers.

The HarmonAI open-source project is also worth mentioning. It describes itself as a community-oriented organization that provides open-source tools for generative audio to make and promote music production more accessible to all.

Its current work, "Dance Diffusion," a generative audio model, is already available for testing through the Dance Diffusion Colab. Harmonai is supported by London-based startup Stability AI, which also enabled the open-source Stable Diffusion model.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Where is the "DALL-E for music"?

Too little training data, too many copyrights

Lack of expertise outside academic research

Nvidia positions GR00T N1 to dominate robotics ecosystem

When does generative AI for music have its Midjourney moment?

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

Studio Ghibli founder Hayao Miyazaki's viral AI criticism lacks crucial context

Google adds native image generation to Gemini language models

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

Where is the "DALL-E for music"?

Too little training data, too many copyrights

Lack of expertise outside academic research

When does generative AI for music have its Midjourney moment?

Share

Bank details