AI in practice

Why Alexa still can't carry on fluent dialogues despite significant AI progress

Jonathan Kemper

Midjourney / DALL-E 2 prompted by THE DECODER

Setting the alarm clock, asking about the weather - Alexa understands simple commands without problems. But beyond that, things get tricky. Why is that?

Compared to large language models (LLMs) like GPT-3, voice assistants like Alexa and Google Assistant are pretty tight-lipped. Real conversations do not take place, the systems only understand trivial commands right away and turn them into an action.

The latest research already enables chatbots with more eloquence, so why not advanced voice assistants? AI researcher Gary Marcus, author of The Road to AI We Can Trust newsletter, explores the question in this month's issue.

Marcus rules out fairly obvious reasons from the outset. Has perhaps no one at Amazon been following the latest scientific findings? Probably not - after all, LLMs have long been used for the powerful product recommendation engine.

The unwillingness to invest in licensing costs is also unlikely, as the company could easily provide the infrastructure itself with its Amazon Web Services. Moreover, Amazon has sufficient experience in scaling such systems.

Alexa powered by a large voice model could lead to Amazon losing control

Marcus formulates five reasons that contribute to Alexa not being able (or rather: allowed) to hold conversations, even though it would be technically possible. They ultimately boil down to one central point: LLMs are not yet reliable enough for broad and automated commercial use.

Amazon would rather sell its customers a product that reliably performs a limited range of tasks. Language models, on the other hand, are unpredictable and difficult to control, Marcus writes.

Moreover, while GPT-3 can generate a string of connected words, it can't yet reliably link them to actions. Startups such as Adept and Google's SayCan are working on this.

Amazon lays off employees for AI and conversation

At the moment, it doesn't look like Alexa will be making leaps and bounds in the foreseeable future. A few days ago, Amazon announced that it was laying off thousands of employees amid the crisis in big tech stocks.

Employees in AI systems, natural language processing (NLP) and conversational skills were particularly affected. This could be an indication that Amazon is scaling back its Alexa efforts, or at least not pushing them at the moment. According to a media report, Alexa hardware development alone is said to have brought Amazon a loss of ten billion US dollars this year. No other area at Amazon causes such high losses.

Perhaps Google, which has been researching NLP extensively in recent years and is betting on Google Assistant as its next interface, will do better. By early 2023, Google Assistant should be able to overcome natural pauses in speech and other stumbling blocks in understanding human voice commands.

In addition, Google is currently rolling out LaMDA, its advanced conversational AI, in a test environment. LaMDA could be the basis for a next-gen Assistant and a new form of Internet search - provided Google gets a handle on what Marcus calls unruly LLMs.

The fact that Google is only rolling out LaMDA step by step and has been intensively testing it internally for months is directly related to the points of criticism mentioned by Marcus: It is about security and reliability.

In this context, for example, prejudices, racism, or aspects that are difficult to predict, such as the pretense of awareness that ex-Google employee Blake Lemoine fell for, play a role. Google's sister company Deepmind recently unveiled a dialog AI optimized for security.