From Amazon’s various incarnations of the Echo to the upcoming Apple HomePod, smart speakers have captured the imagination of consumers across the globe in 2017. In the midst of the hype Rob Horgan takes a step back and asks how smart actually are smart speakers?
They said it was a gimmick, played it down as a flash in the pan, but here we are in 2017 and one of the year’s most hotly anticipated products is the HomePod – Apple’s answer to the Amazon Echo and Google Home. It may have taken a few years for consumers to buy into the idea of talking to their speakers, but now they have the sky is the limit. Amazon’s Alexa led the way, Google’s Assistant followed suit and now Apple is ready to throw Siri in the mix. Samsung’s Bixby and Microsoft’s Cortana are not far behind and a host of manufacturers have announced that they plan to incorporate existing smart assistants into their products.
Just as people point back to the Mosaic web browser in 1993 and the Apple iPhone in 2007 as corner stones of computing history, the Amazon Echo’s arrival in 2014 will be heralded as the dawn of a new age of technology. To give an idea of current and near-term market size, a snapshot forecast put out by Gartner suggests that VPA-enabled wireless speakers will generate around $3.52 billion in global revenue by 2021. The message is loud and clear: the smart speaker take over is in full swing. But how smart are the devices transforming living rooms across the globe?
Voice recognition, natural language understanding (NLU) and response capabilities are becoming very good. Voice recognition is now north of 95 per cent accurate, according to Voice Bot, a website dedicated to tracking the growth of everything ‘smart’. The ability to interpret user intent is also becoming much better and there are over 20,000 third-party skills on Alexa (whereas two years ago there were fewer than 20).
Voice assistants recognise human speech and intent better and can do more in response than they could do just a year ago. Amazon’s Alexa leads the way, while Google Assistant and Microsoft’s Cortana are very sophisticated in speech recognition and NLU. Siri – among others – is however playing catchup, but Apple isn’t too fussed about building the smartest smart assistant. Instead, CEO Tim Cook wants to ‘revolutionise home listening’, positioning the upcoming HomePod as first and foremost a high end audio device, and secondly as a smart speaker.
Consumers can already play music, make phone calls, add information to calendars and receive daily news briefings from their smart speakers. So what can the likes of Amazon and Google do to make them smarter? As Rod Slater, head of Smart Tech at Exertis explains ‘conversation skills are hard to implement’ and still have a way to go. “Context tasks become a lot more complex to explain,” he says. “It’s the difference between ‘switch the lounge lights on’ and ‘switch the lights on’. Associating devices with rooms where the request originated seems an obvious thing to do, but early smart speakers still need very specific phrasing to get the desired result. To create anything beyond the basic interaction model needs work.”
Singing very much from the same hymn sheet, Bret Kinsella from Voice Bot adds: “Where they still come up short today is maintaining context both from the interaction and for the individual. And, they don’t have agency to do things on your behalf. I want AI engineers to focus on context and agency so that the voice assistants become more effective and can even anticipate which activities should be executed and do them proactively.
“I’d also like to see more AI-driven response capabilities. All of the voice assistant applications today use AI for speech recognition and NLU, but then return structured information stored in databases. These are programmatic responses similar to what we have in mobile apps today. We should start seeing more dynamic responses where AI also assembles the best answer based on collecting information and executing tasks from interacting with multiple web services that each have deep domain expertise.”
Meanwhile Dave Sobel, senior director of Community at Solarwinds (and keen Alexa user) would like to see smart speakers remove the ‘stilted command language’ that is currently needed to operate them. “The biggest problem with smart assistants is that they still require a specific series of commands to instruct them, and they don’t always understand context,” he says. “You can’t stack commands, where the next command builds on what you did previously, and they need their commands in the specific phrasing they require. Removing the stilted command language and having commands be more natural and context sensitive is a big step.”
One way in which smart speaker manufacturers are attempting to get smarter is by joining forces and sharing skills. At IFA, Microsoft announced a partnership between its smart assistant Cortana and Alexa, whereby users can take advantage of Alexa’s features on Cortana-powered devices and vice versa. On paper, at least, the partnership makes sense, and does fill in some gaps for both companies. Microsoft for example has no shopping component but brings a deep familiarity with user scheduling thanks to its Office 365 suite of products. But the issue remains that collaboration is just another way of adding skills to devices rather than making them smarter. Now the onus should be not on adding complexities but making them simpler and therefore smarter.