Content
summary Summary

Resemble Enhance is an open-source AI model that can significantly improve the quality of audio recordings.

The startup Resemble AI offers several AI tools for voice cloning, blending, and localization, as well as text-to-speech, speech-to-speech, and voice dubbing capabilities for various applications.

Now, the company has released Resemble Enhance, an AI model that converts noisy audio into clear speech. Unlike the company's other models, Resemble Enhance is open source.

Resemble Enhance for podcasts and historical recordings

Resemble sees applications for the technology in areas such as podcasting, the general entertainment industry, and the restoration of historical audio documents. The company shows what this sounds like with an example of an old lecture.

Ad
Ad

Video: Resemble AI

The model consists of two main components: a denoiser and an enhancer. The denoiser uses a UNet model to separate speech from background noise to improve intelligibility. The enhancer uses a latent conditional flow matching (CFM) model to correct audio distortion and expand audio bandwidth.

The development team plans to continue improving Resemble Enhance, including optimizing processing times and extending control over individual speech elements to further improve audio quality. In the long run, the model should also be able to improve audio recordings that are more than 75 years old.

Resemble offers a demo of Resemble Enhance on HuggingFace. The code is available on GitHub.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Resemble AI has released an open-source AI model called Resemble Enhance that improves the quality of audio recordings by converting noisy audio into clear speech.
  • The model consists of two main components: a denoiser, which separates speech from background noise, and an enhancer, which corrects audio distortion and expands audio bandwidth.
  • Resemble sees applications for this technology in podcasting, the entertainment industry, and the restoration of historical audio documents, and plans to further improve audio quality and processing times.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.