The pace with which voice technology has approached market saturation is unprecedented. As voice moves further into the center of our lives, we’re seeing a lot of exciting innovations in its application—on the technical side, certainly—but also from a user experience perspective.
The deeper we wade into the capabilities of voice, the more potential we see for it to completely change how we think about human-computer interaction, and by extension, the way brands connect and transact with their users.
In short, we’ve got a chance to really do this voice revolution thing right. But in order for voice to truly reach its potential, we have to look not only at its advantages over past modes of user interaction; we have to keep a keen eye turned toward identifying and solving for voice’s vulnerabilities and unique challenges.
Let’s look at a couple of the largest challenges brands need to be thinking about as they shift to a voice-focused digital presence:
Challenge 1: Avoiding the chaos of competing voice assistants
The BBC recently announced plans to launch its own voice assistant in the coming months. Rather than making its own hardware, the broadcaster says it will create software that will work with all smart speakers, TVs, and mobile devices.
Ostensibly, the BBC is doing this in order to create a voice control system that is good at understanding all the regional accents spoken around the UK. The “big three” systems don’t necessarily do that well today, and the BBC wants its core consumers to have a better experience.
More likely, the real reason is data, as explained in this story from The Guardian. Voice assistants are another source of detailed information about consumer likes and habits, similar to search engines. This data has a lot of value, and so media companies and other content providers will inevitably tussle with players like Google and Amazon for access to that data.
That’s going to lead to a lot more efforts like the BBC’s to develop alternatives to compete with market leaders like Alexa, Siri, and Google. That may be understandable from a business perspective, but depending on how it plays out, there’s a real risk of chaos and voice “uncontrol.”
Towers of Babble
Let’s consider what it would be like if the BBC and other major media players each developed a voice system you could install on any smart speaker. (Forget for a moment that Amazon, Apple, and Google have determined what their wake words will be and aren’t going to open their systems to others.) Having even a half dozen different voice systems embedded in one device would mean that the device would respond to a half dozen different wake words.
Today, we can laugh when Alexa wakes up because you’re talking about it, not to it. It won’t seem so funny when those unintentional “wakes” happen all day long, or when two different systems are activated at once. (The BBC is considering “beeb” as its wake word; that may be great branding, but as a single common syllable, it’s going to mean constant accidental waking.)
It’s also likely that these additional voice systems would be designed primarily for consumers to use specific services, like a media company’s streaming radio or news programming. That would mean users choosing different voice systems for different use cases, remembering which ones worked for which services, backing out to another service to do something else.
That kind of chaos creates an unacceptable user experience and will have people turning off mics by the millions. Neither content makers nor users get any benefit from that.
Challenge 2: Keeping up with scam artists
Bad actors are a fact of life. Voice and AI hold a lot of promise for making our interactions with technology more natural and efficient. At the same time, criminals will try to exploit those technologies like they have so many others, to scam people and enrich themselves.
It’s already happening.
A recent headline demonstrated just how efficient and effective criminals can be at manipulating voice. An executive in the UK picked up his phone and heard what he believed was the voice of his German boss, telling the executive to transfer nearly a quarter-million dollars to a supplier in Hungary. Only it wasn’t his boss at all, but an AI program mimicking the voice – and the money wound up in Mexico.
But the introduction of voice assistants adds a more subtle channel for scammers to target. Recently, the Better Business Bureau pointed to examples of users asking Alexa or Google for a customer support number, only to be connected with a fraudulent number. Scammers need only buy spots in the promoted search results at the top of a search engine for reputable brands, pop a fake support number in there, and wait for users to hand over their personal information, credit cards, and more, believing they’ve been connected to the real company they asked for.
Protecting your users from themselves
Security isn’t just about scams; it’s about protecting our users from frustrating and potentially costly user error.
For example, one of the ways voice interaction has made life easier is the ability to use an Alexa-enabled smart speaker to order goods and services. You can recite your grocery list, or the components for an entire home theater, and it will be delivered to your door. It’s so easy that there are more than a few stories about the five-year-old ordering a room full of toys, or even TV newscasters or commercials triggering orders.
This kind of unintended user error might not be the business’s fault, but they’ll still end up bearing the consequences of costly resolution cases and increased user dissatisfaction. It behooves companies moving into voice to take on the responsibility of doing everything in their power to insulate their users from the risks introduced by the technology, even when that means protecting users from themselves.
Solution: Give Your Apps a Voice™
The chaos introduced by competing top-level voice assistants and under-considered security measures doesn’t only pose a threat to users, but also to brands who employ the technology poorly. Rather than creating more and more top-level voice systems, companies should enable multi-modal voice experiences at the app level.
First, although there are millions of smart speakers and fixed devices in the marketplace already, the main way we’ll be using voice in the future is through our smart mobile devices. They are rapidly becoming the “control center” of our lives and the way in which we’ll consume most of the media and content out there (you’re probably reading this on a mobile device now). While the top-level assistants work on these devices, the true user value comes from the apps on those devices; that’s where voice should be.
Second, embedding voice in apps accomplishes what the BBC and others really want, which is to give their users a tailored experience. Opening your BBC app and saying, “play the news on my bedroom TV,” or asking to hear all the science reports or the latest music, takes advantage of the content that’s available. More than merely enabling the actions users want to be able to perform with voice (because they’re easier than typing or clicking), this also gives the app developer the data they want and can’t get today.
As far as security is concerned, in-app voice experiences give your brand (and your users) additional control over authentication, dramatically reducing the channels of access by bad actors. Of course, there’s no substitute for an informed user base well-trained to recognize a scam, but there are ways to help our users help themselves.
On the old Star Trek show, Captain Picard could use his voice to order a cup of tea. He also could use it to command the ship to self-destruct. Thankfully, that second command required both a voiceprint identification and a verbal passcode. It’s the same in real life as science fiction: If what you want to do has serious consequences, you need a second layer of authentication.
For example, if I call Schwab’s automated system, it uses my voice as my account password. But to spoof the system into doing something with my money, you’ll have to do more than just manipulate some recordings of my voice from social media. You’ll also have to be calling from a phone number that’s linked to my account and be able to provide some other information that only I should know. While that takes you a little more time, it makes it considerably more difficult for a thief to impersonate you.
Voice technology is moving into every area of our technology interactions very rapidly, and the opportunities for brands willing to integrate voice into their digital strategy are massive. The best digital products have always been built with great user experiences in mind, and with new technology, delivering those successful interactions means anticipating both the opportunities and challenges that technology poses.