Amazon's LLM-powered Alexa faces generative AI's common reliability challenges

Jan 14, 2025

Midjourney prompted by THE DECODER

Key Points

Amazon has been working on turning Alexa from a simple assistant into a personalized concierge over the last two years, using generative AI models to make interactions more natural and add more features.
The switch is tricky because the unreliability of generative AI has to match up with the high expectations people have for voice assistants. Rohit Prasad, AGI team leader at Amazon, says hallucinations, response speed, and reliability are the biggest problems.
It's not clear when or if these issues can be solved. Amazon is using its own Nova models and Anthropic's Claude AI model to develop the improved Alexa assistant.

Amazon's effort to transform Alexa from a basic voice assistant into an AI-powered personal concierge isn't going as smoothly as planned.

The company is finding it challenging to make AI models reliable enough for everyday consumer use, the Financial Times reports.

For the past two years, Amazon has been working to completely rebuild Alexa's core technology. The company wants to evolve Alexa beyond simple tasks like playing music and setting alarms into something more sophisticated - an AI assistant that can recommend restaurants and adjust home lighting based on your sleep patterns.

While Amazon released an early preview of an AI-powered Alexa in fall 2023, the full launch keeps getting delayed. According to the Financial Times, the main issue is that generative AI systems aren't consistent enough for daily use.

Rohit Prasad, who leads Amazon's artificial general intelligence team, points to three main technical challenges: eliminating AI hallucinations, improving response times, and ensuring reliability. "Hallucinations have to be close to zero," Prasad says.

These hallucinations - when AI systems make up bullshit - aren't exactly bugs, but rather a fundamental problem or feature of how the technology works. Unlike traditional software, which follows strict rules, generative AI systems work with probabilities, making them less predictable. It's unclear whether hallucinations can ever be fixed, or at least mitigated to the point where they don't cause harm.

Balancing old and new technology

Getting the old and new systems to work together creates another hurdle. The team needs to find a way to combine Alexa's traditional, predictable rules-based code with newer AI models that are more capable but harder to control.

According to Prasad, making this work is especially tricky because Alexa has to interact with hundreds of different third-party services that handle billions of user requests each week.

Users have high expectations, too. They expect Alexa to respond quickly and accurately - requirements that don't always align with how current GenAI systems work.

According to one Amazon employee, there's still a lot of work to be done. The team needs to add safety features, like parental controls, and make sure the system works properly with smart home devices. The need to get everything nearly perfect every time is why Amazon, Apple and Google are all taking their time rolling out new AI features, the person says.

Someone who used to work on the Alexa team as a senior member says rushing things could backfire. If the LLM starts making up answers, that's a serious problem at Amazon's scale. With millions of users, even a small percentage of wrong answers could mean thousands of mistakes every day, potentially damaging Amazon's reputation.

To build its next-generation assistant, Amazon is using both its own price-performance optimized Nova models and Claude, an AI system from Anthropic - a startup Amazon has invested $8 billion in over the past year and a half. This massive investment creates another challenge for the Alexa team: figuring out how to eventually turn these expensive AI tools into something that actually makes money.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Financial Times