Microsoft and Project Gutenberg release over 5,000 free audiobooks

Microsoft and Project Gutenberg have used AI technologies to create more than 5,000 free audiobooks with high-quality synthetic voices.

For the project, the researchers combined advances in machine learning, automatic text selection (which texts are read aloud, which are not), and natural-sounding speech synthesis systems.

First, they developed an algorithm that understands the structure of an HTML-based e-book and distinguishes between the main text and unimportant elements such as footnotes, page numbers, or tables.

This so-called parsing is followed by the actual conversion of text into speech (text-to-speech, TTS). In this project, WaveNet, Tacotron and FastSpeech in particular were used, which are capable of producing natural and human-like speech output.

In addition, the team developed a system capable of distinguishing between narrator and dialogue, and here even between individual characters and their emotions, and adapting the generated voice accordingly.

The entire process chain runs on the machine learning framework SynapseML, which is designed to break down the various tasks and process them in parallel.

"We believe that this work has the potential to greatly improve the accessibility and availability of audiobooks," the team writes. Hear for yourself how "How to Tell a Story, and Other Essays" by Mark Twain sounds.

Have your voice narrate an audiobook

For the conference presentation, the team also developed a zero-shot text-to-speech approach that can capture the character of a user's own voice from a few recorded sentences and transfer it to the narration of the audiobook.

This allows users to select a book from the digital library and have it read to them in their voice - or in the voice of their choice if they have audio files. It's not yet clear if this service will be available beyond the conference, but it seems unlikely given the potential costs.

Recommendation

AI in practice

Hundreds of examples in prompts can significantly boost LLM performance, study finds

In total, the project has collected more than 35,000 hours of audio data on classical literature, plays, biographies, and more, read "in a clear and consistent voice."

This dataset alone could be useful for further AI projects. The research team intends to make all audio data available as open source without restrictions.

The audiobooks are available on Spotify, Apple Podcasts, or Google Podcasts. More information about the project is available on the official website.

Project Gutenberg is a free digital library accessible via the Internet. It is created by volunteers. More than 70,000 e-books are available to read and download for free on the project's website.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Microsoft and Project Gutenberg release over 5,000 free audiobooks

Have your voice narrate an audiobook

Hundreds of examples in prompts can significantly boost LLM performance, study finds

Udio v1.5 music generator narrows the gap between human-created and AI-generated music

AI music editor developed by Sony and researchers can modify songs with text prompts

Music industry against music generators could set major precedent for the future of generative AI

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

AI models might need to scale down to scale up again

Microsoft and Project Gutenberg release over 5,000 free audiobooks

Have your voice narrate an audiobook

Share

Bank details