summary Summary

Enter a line of text and hear a piece of music after a few seconds? There are still some hurdles to clear before that happens, says one analyst.


First, it was AI-generated text, then images, which have become more sophisticated recently. HD Video and 3D AI generators are also in the works.

That rightly raises the question: Where is a similar service to GPT-3, Midjourney, or DALL-E for the music industry? Cherie Hu of Water and Music, a research and intelligence network for the new music industry, made some arguments in a Twitter thread as to why such a service is a long time coming.

Too little training data, too many copyrights

The first point she raises is the lack of training data. While each of the available text-to-image models has been trained with dozens of terabytes of data, there is not nearly as much public training data for music. To get to that point, Hu says, you'd have to train a model with all published music and also access the private drafts of DAWs like GarageBand, Ableton Live, or Logic.


As with image generators, copyright considerations also play a major role: It is true that millions of music tracks can be pirated from music streaming services and then used for training. But that would immediately bring the major labels and their lawyers onto the scene.

"Lawyers in the music industry have more power than in any other creative industry," Hu says. Some Artists and coders are already fighting generative AI that could infringe on copyrights.

Lack of expertise outside academic research

While breakthroughs are being made from the open-source community in image and text AIs, the music industry is still dominated by academia. "There's less data, so the work is just harder and slower. And the Nexus of people who know machine learning, music production, signal processing, etc., is tiny."

According to Wu, this also has to do with the fact that music is more difficult to sift through and, above all, to evaluate than visual art. "It literally takes time to listen to and evaluate a one-minute song. In that same time, you can scan hundreds of images."

Hu summarizes that the best AI models for music currently …

  • require more specialist technical knowledge to run,
  • take longer to run,
  • are more expensive to run,
  • have only OK output,
  • and are harder to rally public excitement around.

When does generative AI for music have its Midjourney moment?

However, Hu draws a conclusion that shouldn't make the music industry breathe a sigh of relief: "This is all going to change very soon, given how quickly the creative AI landscape is evolving."

Early examples include startups like Mubert, which recently unveiled a text-to-music model, and Sony's AI division, which is researching neural synthesizers.

The HarmonAI open-source project is also worth mentioning. It describes itself as a community-oriented organization that provides open-source tools for generative audio to make and promote music production more accessible to all.

Its current work, "Dance Diffusion," a generative audio model, is already available for testing through the Dance Diffusion Colab. Harmonai is supported by London-based startup Stability AI, which also enabled the open-source Stable Diffusion model.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • AI can produce high-quality text and images, and videos and 3D models are also in the works. But what about music?
  • The founder of an analytics network for the music industry cites possible reasons why generative AI is not yet an option for music.
  • Above all, there is a lack of publicly available data, its labeling is complex, as is the copyright situation.
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.