It's good to talk

PCR chats to Neil Grant, from speech recognition software firm Nuance, about flagship program Dragon NaturallySpeaking and the future of voice technology?
Author:
Publish date:
4_dragonpic2small.jpg

Speech recognition technology has made incredible leaps over the years, but it’s yet to see widespread understanding or demand from consumers, despite having a number of valuable uses.

Nuance is arguably the market leader in the field, working with companies worldwide to offer speech solutions in the healthcare and legal industries, as well as to mobile phone and car manufacturers. It offers programs such as Dragon NaturallySpeaking (DNS) for PC and Dragon Dictate for Mac that make it easy to get words onto the screen without ever putting finger to keyboard, as well as mobile apps.

We spoke to Neil Grant, sales director for the UK, South Africa and Benelux, to learn a little more about the software, and find out where speech recognition is going next...

Nuance products are used around the world, and Dragon NaturallySpeaking is on version 11.5 now, yet a lot of people aren’t familiar with the product. Why do you think that is?
Our products have been incredibly successful in specialised areas such as the legal and healthcare markets, for example. And over the last three to five years we’ve done a lot of business in the general retail space – it’s a large part of our overall business – but we do need more awareness among consumers.

One reason for all this is that speech recognition takes a
lot of computing power to do it well and to do it quickly, and in order for it to be really commercially viable. It’s only been over the last few years that computers have had the additional power to run the complex processes required – now there’s a better and more user-friendly experience available, with quicker results, and less training needed to use the software in the first place.

You must see a huge range of uses for the software.
There’s a huge variety of usage – from people recording
family history, to dictating inventory reports to check out of houses, to novelists, through to extreme use cases like Stuart Mangan, who was a rugby player until he was injured and paralysed from the neck down. Using DNS, he could not only use dictation for emails or letters, but with a little integration he was able to channel surf his TV by voice, send texts to his friend, use social media. That struck me as one of the most exciting areas of usage. Sadly Stuart Mangan died in 2009, but he said using DNS gave him his independence and privacy back – for us it makes the work worthwhile.

You joined Nuance in 2001. How has the technology changed since then?
At the time it seemed mind-blowing, but to see how much it’s improved since then is phenomenal. First, the speed and accuracy has really improved. And the ease of use, in terms of command and control of the computer itself, has become a lot more intuitive. Instead of having to learn whole lists of commands by heart you can say what you see or what you want to do. We recognise that there are many ways of saying the same thing. It makes massive differences to the usability of the product.

The technology is said to be 99 per cent accurate – is it possible to see much improvement after that point?
You can always improve. Speech recognition is a statistical process. The more data you can crunch, the better the recognition is going to be. There are always adaptations, new words coming in – through slang and social media, for example. It needs to be kept on top of.

We measure accuracy in terms of word error rate. If you’re creating a 1,000-word document, that could mean ten errors and that’s something we can improve on. Accuracy is not something we’re ever going to sit back and take for granted.

Other improvements might be around natural language and understanding what people want the computer or device to do, without them having to learn exact commands.

We’ve noticed increasing numbers of software firms releasing apps to run along their main program, including Nuance. How is this working for you?
We have Dragon on the iPhone, FlexT9 on Android and various solutions on BlackBerry – mostly in the US, as it’s dependent on the carrier.

In terms of the iPhone application, it’s an important piece for us. Even though it’s a free app, the job that it does in getting the brand and speech in front of the mass market is very important.

The next version of Windows is all about touch. Do you think we will ever see voice-based operating systems/computers? Is it something we might see Nuance do?
I don’t see Nuance moving into producing our own OS. What we’ll do is work with market leaders such as Windows, Apple, Android, and so on.

The keyboard and mouse were always traditional input methods for getting data into your computer. We’re now moving into this more multi-modal set of possibilities, where we have touch, we have speech, we have gesture and all of these things will come together.

Nuance has never been about only using speech to the
detriment of everything else. What we’ve always been about as a company is productivity. It’s all about improving the way people work, making it smarter, easier, faster. Where it’s faster to touch you’ll do that, and where it’s faster to talk, you’ll do that.

Related