Adobe's "MultiFoley" AI creates synchronized sound effects for video

Adobe Research and University of Michigan researchers have created an AI system that generates Foley sounds—the custom sound effects added to films and videos during post-production.

The system, called MultiFoley, lets users create sounds through text prompts, reference audio, or video examples. In demonstrations, the system transformed a cat's meow into a lion's roar and made typewriter sounds play like piano notes, all while maintaining precise synchronization with the video.

Video: Adobe

The system stands out for its ability to generate high-quality audio at 48kHz bandwidth. The researchers achieved this by training the AI on both internet videos and professional sound effect libraries.

MultiFoley is the first system to combine multiple input methods—text, audio, and video references—in a single model. It maintains tight synchronization between video and generated audio through a specialized mechanism that analyzes visual features at 8 frames per second, then scales them up to match the 40 Hz audio sampling rate.

Two pairs of images with spectrograms: on the left, a bird with singing patterns; on the right, a typewriter with mechanical sound patterns; each with three variations. — MultiFoley can generate any sound effect—from birdsong to typewriter clicks—and sync it with video footage by following simple text commands. | Image: Chen et al.

The system achieves average synchronization accuracy of 0.8 seconds, significantly better than previous systems that typically lagged by more than a second.

Testing shows major improvements in sound quality and timing

In tests against existing systems, MultiFoley showed superior performance in audio-video synchronization and matching generated sounds to text descriptions. A user study found that 85.8 percent of participants rated MultiFoley's semantic consistency better than the next-best system, while 94.5 percent preferred its synchronization.

Radar diagram: Comparison of 8 audio generation methods over 6 metrics (FAD@AUD, FAD@VGG, AV-Sync, CLAP, ImageBind, KLD), different colored polygon lines. — The radar chart compares eight different audio generation methods based on six performance metrics. MultiFoley (blue) outperforms in most cases. | Image: Chen et al.

The researchers note some current limitations. The system's training data was relatively small, limiting its range of sound effects. It also struggles with generating multiple simultaneous sounds.

The team plans to release the source code and models soon. While Adobe hasn't announced plans to add MultiFoley to its products, the technology would fit naturally alongside the AI capabilities already present in Adobe's Premiere Pro video editing software. The system could benefit individual creators as well as production companies looking to streamline their sound design process.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

Adobe's "MultiFoley" AI creates synchronized sound effects for video

Testing shows major improvements in sound quality and timing

AI language models struggle to connect the dots in long texts, study finds

Google releases Magenta RealTime, an open source AI model for live music creation

Chatterbox is a free open-source voice cloning model with emotional tone control

Elevenlabs' new AI voice system enables smoother interactions through real-time analysis

Google downplays AI's environmental impact in new study

Deepseek’s first hybrid model V3.1 surpasses its R1 reasoning model on benchmarks

Meta's human-like chatbot personas can mislead users and result in real-world harm

Adobe's "MultiFoley" AI creates synchronized sound effects for video

Testing shows major improvements in sound quality and timing

Share

Bank details