Every day people ask me questions like, “Where is mobile going?” “Aren’t apps a fad?” “Will everything go to the browser?” “Aren’t chatbots the future?”
Humans interact with technology to do something better or faster. It’s that simple. The big breakthroughs of the past two centuries (telephones, cars, airplanes, radios, TVs, computers, the internet, mobile devices, etc.) show this to be true.
When we try to divine the future of mobile, we need to start with speed. Specifically, how the UI can drive the “critical path to information,” assuming “information” encompasses communication, entertainment, commerce, and anything else we may do on our devices.
Over time, UX will be relentlessly pushed to maximize information transmission and reception speed. For example, in English:
- Humans type at ~40 words per minute.
- Humans speak at ~130 words per minute.
- Humans read at ~250 words per minute.
The conclusion is obvious—the optimal experience for interacting with a device is to speak for transmission and read incoming information. That’s why Apple, Google, and Microsoft are spending billions on developing better speech recognition and AI.
However, early chatbot experiences can be dismal largely because those bots stand alone.
At WillowTree, we firmly believe the future lies in a multimodal interface. A multimodal interface allows a user to speak via a variety of entry points—for instance, within an app, within an assistant like Siri, or within a chatbot. The user then receives information visually through an app or through dedicated messenger or bot applications. Again, this is why Apple and Google are adding UI capabilities to their messaging apps.
Here’s an example. A user thinking about seeing a movie might start by asking, “What movies are playing nearby?” This would generate a list of movies at nearby theaters, which the user could visually scan through. This data could display within the Regal Mobile App, within a messaging app, or within an assistant—so that the data is disaggregated from the underlying application.
As the user selects a movie (looking at reviews, locations, what their friends have seen and liked, if anyone wants to join), they will constantly bounce between speaking and reading—between voice- and screen-based interfaces.
What does this all mean for app development? It means we’re just getting started. Historically, apps (and mobile web experiences) have been self-contained experiences—straightforward in theory, but difficult in practice to get right. We anticipate a world where multimodal UIs become the norm, making implementation much more complex, but also allowing for vastly improved user experiences.
UX and development teams will need to take into account multiple entry points to the digital experience with a brand—Siri, Google Assistant, another chatbot, Messenger, Snapchat, the app itself, a mobile website, and more. Then, they need to understand that users will bounce among all these interfaces while interacting with the brand. This requires each experience to become completely disaggregated and incredibly smart, based on the user’s last touchpoint.
How will that look in real life?
- I want to go to a movie and ask my phone via Google Assistant or Siri to look up nearby Regal Cinemas theaters. (Spoken)
- The Regal app launches and gives me options, displaying them within the Siri/Chatbot interface. (Read)
- I ask for reviews of two movies (Spoken) and see them (Read).
- I ask to send the showtimes and reviews to two friends via Snapchat and my wife via text. (Spoken)
- After I get the orders back, I go into Siri and order four tickets. (Typed and read)
- The app launches and I confirm purchase with a touch ID. (Touch)
- I tell the app to send tickets to my friends and wife. (Spoken)
- As I enter the theater, the app pops up and I scan the ticket.
As consumers, our lives are about to get a whole lot easier with Multimodal UI. As digital professionals, our next job is to begin exploring the design and technical challenges of bringing these experiences to life. As Steve Jobs said, “This is what customers pay us for—to sweat all these details so it’s easy and pleasant for them to use our computers. We’re supposed to be really good at this. That doesn’t mean we don’t listen to customers, but it’s hard for them to tell you what they want when they’ve never seen anything remotely like it.