Content
summary Summary

Microsoft and Project Gutenberg have used AI technologies to create more than 5,000 free audiobooks with high-quality synthetic voices.

For the project, the researchers combined advances in machine learning, automatic text selection (which texts are read aloud, which are not), and natural-sounding speech synthesis systems.

First, they developed an algorithm that understands the structure of an HTML-based e-book and distinguishes between the main text and unimportant elements such as footnotes, page numbers, or tables.

This so-called parsing is followed by the actual conversion of text into speech (text-to-speech, TTS). In this project, WaveNet, Tacotron and FastSpeech in particular were used, which are capable of producing natural and human-like speech output.

Ad
Ad

In addition, the team developed a system capable of distinguishing between narrator and dialogue, and here even between individual characters and their emotions, and adapting the generated voice accordingly.

The entire process chain runs on the machine learning framework SynapseML, which is designed to break down the various tasks and process them in parallel.

"We believe that this work has the potential to greatly improve the accessibility and availability of audiobooks," the team writes. Hear for yourself how "How to Tell a Story, and Other Essays" by Mark Twain sounds.

Have your voice narrate an audiobook

For the conference presentation, the team also developed a zero-shot text-to-speech approach that can capture the character of a user's own voice from a few recorded sentences and transfer it to the narration of the audiobook.

This allows users to select a book from the digital library and have it read to them in their voice - or in the voice of their choice if they have audio files. It's not yet clear if this service will be available beyond the conference, but it seems unlikely given the potential costs.

Recommendation

In total, the project has collected more than 35,000 hours of audio data on classical literature, plays, biographies, and more, read "in a clear and consistent voice."

This dataset alone could be useful for further AI projects. The research team intends to make all audio data available as open source without restrictions.

The audiobooks are available on Spotify, Apple Podcasts, or Google Podcasts. More information about the project is available on the official website.

Project Gutenberg is a free digital library accessible via the Internet. It is created by volunteers. More than 70,000 e-books are available to read and download for free on the project's website.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Microsoft and Project Gutenberg have produced more than 5,000 audiobooks using AI to create natural-sounding synthetic voices.
  • They developed systems that distinguish between the main text and unimportant elements, as well as between the narrator, the dialog, and the individual characters and their emotions in the audiobook. A special feature allows you to have your voice read the audiobook based on a few recorded sentences.
  • The more than 35,000 hours of audio data collected so far will be made available as open source and could be used for other AI projects.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.