In 2021, most people reading this have likely used a voice assistant at least once before, as have their children or grandchildren and parents or grandparents. It’s not terribly surprising that voice adoption is growing faster than any other technology in modern history — (even faster than the Internet and the smartphone).
But as many technology leaders point out, rightly so, we are still in the stage of “Voice 1.0.” The technology is ubiquitous in some ways because most consumers have used it at least once, but it’s limited in the way it impacts users’ habits or businesses operations.
We often hear from our clients, “We know we need to do something with voice, but we’re not sure where to start,” or “Voice is on our roadmap, but we’re not sure how to deliver an experience that our customers will actually use.”
Although the adoption of voice technology has risen faster than other technologies, organizations currently wrestle with how to incorporate voice into meaningful customer or user experiences.
Going beyond “Voice 1.0” and moving to “Voice 2.0” transforms how consumers and businesses utilize the technology. We can’t think about single interaction, but a part of the broader user experience.
How People Currently Use Voice
Data from Google indicates that 50% of the time users start to do a task with voice, they turn to a screen such as a phone, tablet, or computer immediately after to complete the task. So about half the time, people are using screens + voice rather than voice alone.
This is supported by usage statistics from one of the most well-known branded voice assistants: Bank of America’s Erica. Unsurprisingly, Erica’s user base and engagement have grown dramatically during the pandemic, with interactions jumping by 198% YoY from Q1 2020 to Q1 2021. However, only 13% of interactions with Erica are done through voice, indicating that most users aren’t using voice alone, but other aspects of the app such as tap and text.
So, what does this data on user behavior tell us? When people engage with voice, they don’t use voice alone, they use multiple types of interfaces to complete a task. It’s the same concept that we’ve already applied to desktop computers – using a mouse, screen, and keyboard in conjunction to make using a computer as efficient as possible.
We call these interactions that are integrated together to support users multimodal interfaces.
Why is Multimodal the Preferred Use of Voice?
Multimodal interfaces not only support user preferences and behavior that we’re already seeing in data from Google and Bank of America, but also maximize the efficiency and privacy to improve adoption and engagement of voice assistants.
From an efficiency perspective, humans speak at 130 words per minute, but only type at 40 words per minute — that’s why voice assistants took off in the first place. However, humans only listen at 130 wpm, but read at 250 wpm. Therefore, the most efficient use of voice technology is a spoken input, integrated graphic/text output — or a multimodal interface.
For example, if a user asks a voice assistant to find past transactions in their bank account, it would be a faster experience for the assistant to display a screen in the mobile app that shows the transactions, rather than read a list aloud to the user.
Multimodal interfaces also support increased privacy. According to Paysafe, only 37% of consumers feel that their financial data is secure when they pay by voice alone — supporting the transaction with a visual interface can help indicate security and privacy to the user. Research from WillowTree’s innovation team supports this claim — in user testing for a voice-enabled pizza ordering app, users indicated that they felt more certain that their order was correct and that they were paying the right amount when they saw a visual transcription of their spoken utterance on the ordering screen.
What is Voice Uniquely Good at?
Starting with user needs and behaviors rather than the technology itself allows us to ask the question: What is voice uniquely good at? — said otherwise, what are the key use cases for voice in which it is definitively easier and more efficient to complete the task via voice rather than screens?
There are a few examples that WillowTree teams have identified from user input as high-potential use cases for voice:
|Retrieval||Looking through a large dataset to piece of information||“Show me all charges from Planet Fitness from the last 6 months.”|
|Composition||Typing, writing emails and composing lists||"Send a message to my financial advisor that says …”|
|Configuration||Turning a multi-screen, multi-step process into a singular command.||"Buy 15 shares of Apple Inc.”|
Many financial services organizations have already built virtual assistants to execute some of the most common use cases of banking mobile apps, like checking account balances and paying bills. But for most users, logging into their bank account and looking at their account balance is a fairly simple task that takes less than a few seconds and only one or two screens.
By starting with the user need — identifying some of the most arduous, time-consuming tasks that a user might need to get done with their financial services provider, banks are able to better serve their clients and avoid the common pitfalls in building technology for technology’s sake.
As we progress closer and closer to Voice 2.0, feel free to reach out to the WillowTree team — we’d love to hear from you if you’re thinking about integrating voice into your digital ecosystem.