Sony shows neural synthesizer GANstrument

AI researchers from Sony show GANstrument, a neural synthesizer that transforms arbitrary input sounds into instrument sounds.

Generative AI systems such as DALL-E 2, Midjourney, or Stable Diffusion are currently shaking up the visual arts. The text-to-image systems allow impressive results even with simple text inputs.

Comparably powerful systems do not yet exist in music. But here, too, recent projects such as the generative text-to-music model of the US start-up Mubert show where the journey could lead.

Apart from end-to-end music synthesis, there is a second focus in the research field: the synthesis of individual notes that are then played back in a symbolic format such as MIDI (Musical Instrument Digital Interface). This allows independent control of MIDI and timbre, and the process is therefore compatible with production workflows in the music industry.

In a new paper, AI researchers at Sony are now demonstrating GANstrument, a neural synthesizer for instrument sounds.

GANstrument: Sony shows GAN-based neural synthesizer

Currently, realistic instrument sounds are synthesized with samplers that use recorded sounds. Although any sound material can be used, it is difficult to synthesize a completely new timbre or combine multiple sounds in an intelligent way, Sony said. Generative AI models for audio synthesis however have shown that AI can create and mix a variety of timbres.

The research team, therefore, aims to develop a neural synthesizer that combines the flexibility of classic samplers with the generative power of neural networks. With such a tool, users would be able to freely control the timbre based on existing sound material.

For its neural synthesizer, Sony uses a GAN (Generative Adversarial Network), which is trained with waveforms transformed into Mel spectrograms. The team relies on so-called instance conditioning instead of class conditioning, which is usually used in GAN training.

Class conditioning sorts the data into different distributions with no overlap, whereas instance conditioning sorts the data into many overlapping local distributions.

Recommendation

AI research

How DeepMind's Genie AI could reshape robotics by generating interactive worlds from images

GANstrument can turn a rooster into a cello piece

Along with other improvements, such as a feature extractor that is invariant to pitch, GANstrument thus achieves better and more diverse synthesized sounds, as well as a generalization to different sound inputs, the team writes. After the training, GANstrument can transform e.g. flute sounds into brass sounds or organ sounds into guitar sounds.

Flute

Brass

Interpolation (Input 1 to 2)

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

The AI system can also smoothly mix different instruments and thus merge two input instruments into one track, for example.

Melody (Mallet to Reed)

Input 1

Input 2

Interpolation (Input 1 to 2)

The system also works with input sounds it has never heard before. It can transform them into known instrument sounds or change the pitch of the input. GANstrument can therefore also convert the crow of a rooster or a cat's meow into sounds of different pitches.

Rooster Chicken

Pitch 48

Pitch 55

Pitch 60

According to Sony, GANstrument generates a sound in 1.62 seconds on an Intel Core i7-7800X CPU.

Our novel neural synthesizer, GANStrument, generates pitched instrument sounds reflecting one-shot input timbre within an interactive time. It incorporates two key features: 1) instance conditioning, resulting in better generation quality and generalization ability to various inputs and 2) pitchinvariant feature extraction based on adversarial training, resulting in significantly improved pitch accuracy and timbre consistency.

Sony

The authors believe that GANstrument can produce novel instrument sounds and make desired timbres freely explorable by using a variety of sound materials. Further examples can be found on the GANstrument project page.

Sony shows neural synthesizer GANstrument

GANstrument: Sony shows GAN-based neural synthesizer

How DeepMind's Genie AI could reshape robotics by generating interactive worlds from images

GANstrument can turn a rooster into a cello piece

No, AI doesn’t mean human-made music is doomed. Here’s why

ElevenLabs unveils new AI music generator 'ElevenLabs Music'

Billie Eilish and Katy Perry among 200 artists protesting AI's "assault on human creativity"

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

AI models might need to scale down to scale up again

Sony shows neural synthesizer GANstrument

GANstrument: Sony shows GAN-based neural synthesizer

GANstrument can turn a rooster into a cello piece

Share

Bank details