AI voice synthesis startup ElevenLabs has launched a tool to detect and prevent generative audio fraud. Users can upload audio samples, and the AI speech classifier will determine whether the content was generated by its platform, with a claimed accuracy rate of 99% for unmodified input and 90% for modified input.
Concurrent with the launch, ElevenLabs raised $19 million in a Series A funding round co-led by Nat Friedman, Daniel Gross, and Andreessen Horowitz. The company plans to use the investment to build a voice AI research center and launch additional products targeting market verticals such as publishing, gaming, and entertainment.
Large Language Models (LLMs) are transforming software development, but their newness and complexity can be daunting for developers. In a comprehensive blog post, Matt Bornstein and Rajko Radovanovic provide a reference architecture for the emerging LLM application stack that captures the most common tools and design patterns used in the field. The reference architecture showcases in-context learning, a design pattern that allows developers to work with out-of-the-box LLMs and control their behavior with smart prompts and private contextual data.
"Pre-trained AI models represent the most significant architectural change in software since the internet."
Meta's Voicebox is like Stable Diffusion for voices: The generative AI model synthesizes speech from text and can be used for various speech tasks. Voicebox generates realistic and expressive voices and allows attributes such as tone, style or accent to be adopted from audio files.
According to Meta, Voicebox outperforms existing speech synthesis models such as Microsoft's VALL-E in terms of speech quality and naturalness. "As the first versatile, efficient model that successfully performs task generalization, we believe Voicebox could usher in a new era of generative AI for speech.," Meta said. Due to the risk of misuse, the team has also developed a system for recognizing synthesized speech and has no plans to release Voicebox for the time being.