Research and Insights

Google Highlights Multimodal as a Key Trend in Voice for 2020

Walking into the Convention Center in downtown Chattanooga, TN for the 2020 Project Voice conference felt notably similar to attending some of the first iOS conferences in 2008.

The community was fairly small; there were many enthusiasts present; and attendees shared a common thread that they wanted this new technology to provide the best possible experience to users.

However, many individuals outside the space, both in regards to the iPhone in 2008 and now voice in 2020, were not fully convinced that this would be a dominant next-generation interface.

Apple was in fact well behind the competition in mobile at the time — Windows Mobile, Blackberry, Palm OS, and Symbian were established systems in 2007. The first iPhone had significantly less features than other smartphone options at the time; it did not support 3G, 3rd party apps, editing Office documents, and other capabilities that users had already been using on the Blackberry for years.

The inaugural iPhone’s core emphasis was on the interface and user experience design. Powerful hardware that enabled touch, pinch-to-zoom, and inertial scrolling, coupled with software that reimagined navigation and powered mobile browsing, supported a much more natural human interaction with technology, positioning the iPhone and iOS to completely change the game for mobile.

The first iterations of a voice-enabled smart speaker followed a similar path: this technology is much more about simplicity and enabling humans to use technology in the way that is most natural for them than it is about having an expansive feature set. Humans speak at 130 words per minute, but they only type at 40 words per minute — people want to use voice technology for the same reason that they want to have a quick face-to-face meeting instead of exchanging a long string of emails: because it’s faster and easier to communicate with voice.

Typing vs speaking

But just as users and developers began to demand more functionality from the iPhone beyond the first iOS release (which only included Apple’s own limited suite of mobile apps), there is a clear need today for more functionality from voice technology.

It became clear at both CES and Project Voice that platforms like Amazon, Samsung, and especially Google, have made it a strategic imperative to expand the function of voice technology in 2020 beyond its most common use cases of checking the weather, playing music, and setting a timer/reminder.

Leaders from Google highlighted multimodal voice experiences and integrated technology interfaces as some of their top priorities for 2020. They shared that about half of the time users start with voice, they turn to a screen such as a phone, tablet, or computer immediately after. Interacting with both voice and screens not only enables more complex interactions that require visuals, but it’s also significantly faster. Just as humans speak much faster than they type, they also read faster than they listen, at 250 words per minute and 130 words per minute, respectively.

listening vs reading

Integrating voice into web and mobile apps also dramatically expands voice functionality, and introduces additional channels for brands to interact with their customers and deliver value. We believe that the key trend in voice, and indeed all of digital, will not be voice apps, but voicifying your existing mobile apps to make them much faster and more user friendly.

As presenters at Project Voice delved deeper into multimodal, there was less of a focus on expensive smart speakers with screens and more people begging the question, “Why not enable multimodal on the one screen you have in your pocket all the time?” holding up their smartphone.

We took it upon ourselves to begin answering that question, as illustrated in this prototype for a pizza restaurant’s mobile app.

Embedded content:

Enabling multimodal voice experiences, integrating voice into existing technology, and Giving Your Apps a Voice™, will truly create the best experience for the user, allowing them to accomplish more complex tasks faster, and pushing voice forward as the next generation interface.

Discover How to Drive ROI with Voice Experiences

Download the Report

Conducting a Speculative Harm Analysis

When working in a fast-paced, client-centric environment we tend to focus on accounting...

Read the article