Financial ServicesApp DevelopmentEmerging Tech PrototypingStrategy and Innovation

Why voice should be the cornerstone of any digital banking strategy

Established financial institutions have struggled to keep pace technologically with fintech startups. Voice offers an opportunity for big banks to shift the balance.

Traditional financial institutions have historically been slow to adopt new technology, leaving them vulnerable to a new crop of digital-first competitors. According to a 2016 McKinsey report, “The rise of digital innovators in financial services presents a significant threat to the traditional business models of retail banks.” All these years later, it’s unthinkable for anyone in the finance industry to be oblivious to this reality, and yet for many traditional banks, the struggle to keep up with more nimble digital-first players continues.

“We think of it as we are living through an extinction phase… It is not an incremental thing, it’s an epochal shift.” - Stephen Bird, CEO, Citi

What edge do fintech disruptors have?

The thing is, at their core, there’s nothing these digital companies are offering that the big traditional banks don’t in some form or other. As with most industry disruptors in the digital age, the key differentiators don’t come from the services rendered, but from how those services are rendered. Uber didn’t re-invent the taxi, they just improved the experience of taking one. The same goes for the financial services industry. To survive and thrive in the digital age means focusing on the user experience.

divider 600

How is user experience changing in banking?

For most people, banking is a means to an end—it’s something we do to facilitate the other parts of our lives, but it’s not an activity we necessarily relish. Those kinds of tasks—the purely functional kind—are perfect for offloading to AI.

“Do people wake up in the morning thinking ‘I can’t wait to start banking today?’” -Ken Dodelin, VP of Conversation AI at Capital One

How’s this for a use case: You wake with a start in the middle of the night remembering you haven’t paid your credit card bill, roll over, and ask Alexa to do it for you. End of transaction, back to sleep!

Sounds amazing, but there are some potential pitfalls to consider.

Problem 1: Trust

You’ve probably heard of the Uncanny Valley before, but here’s a quick refresher: researchers have found that as robots progress in their resemblance to humans, trust and familiarity increase… to a point. And once that point is reached, typically the point at which something is so close to a healthy, human person and yet still very clearly not one, the trust index plummets: Screen Shot 2019-07-31 at 3.00.54 PM

This is a significant drawback to the current direction of voice assistants attempting to sound more and more human. Mobile banking is a foregone conclusion at this point—most users have no security qualms about manually moving money around via their bank’s app. But introduce an intermediary AI with a nearly human-sounding voice (i.e. most platform assistants on the market today), and we start to enter the Valley. We know Siri isn’t a person, just an extension of the same device we use for banking all the time, but we still feel uncomfortable giving it access to our bank accounts.

Problem 2: Privacy

Another issue banks face in the age of voice assistants is the high volume of sensitive data involved in banking actions. For a high number of banking interactions, it would be unthinkable for a voice assistant to read back responses out loud. Most people don’t want the intimate details of their financial picture broadcast to any bystanders who might be listening.

Even for interactions occurring largely in private, users feel increasingly cautious of what data they might be handing over to the big tech companies behind many platform voice assistants. Many users worry that if they ask Alexa to complete a banking task, they may have just given Amazon access to all of their bank data. That’s a tough perception to fight.

Problem 3: Speed

At present, the majority of voice interactions occur in a voice-in, voice-out format: user speaks a request, and the device speaks a response back. This is far from the most efficient form of human-computer interaction and is particularly tedious with longer responses (say, a list of recent charges made to your credit card). For a number of tasks, voice responses simply can’t offer any efficiency gains over traditional app interactions.

divider 600

Solution: in-app assistants with visual responses

The best solution to get around all of these voice roadblocks is to bypass Siri and its kind altogether by having a custom multimodal AI at the core of your in-app experience.

Your own voice assistant doesn’t need a personality or name attached to it; you have your own brand to lean on here, one that your users should already trust if they’re keeping their money with you. Your users also won’t have to sweat whether or not Amazon is taking notes on their every transaction because they’ll be interfacing directly with you.

An in-app AI gives you more control over the experience and functionality of voice interactions than those offered by platform assistants, allowing customers to complete a more tailored range of tasks, and faster.

Specifically, you’ll be able to serve up branded visual responses to most spoken user requests. This multimodal speech-visual workflow isn’t only preferable for security reasons, but for speed as well; the average human can read 250 words per minute, nearly twice as fast as we listen: Screen Shot 2019-07-31 at 2.10.21 PM We’re seeing more and more voice-first experiences integrated into apps across industries, and expect that little microphone icon to move more and more central to app experiences in the coming years. A financial institution looking to lead the pack should start integrating voice into the core experience of their banking apps.

divider 600

The key components of a successful in-app voice interaction

For financial institutions looking to stay at the head of the pack in a digital-first world, the value of an in-app voice assistant is clear. So where should companies invest?

There are three key components of a successful voice interaction:

1. Recognition

Can your device accurately read raw spoken input from the user? Good news! This layer is handled almost entirely by operating systems, so there’s not much for brands to do here except cheer on the technology to increase in accuracy. And the future is looking bright; Google’s machine learning engine claims 95% accuracy, which is equal to the human threshold for hearing accuracy.


Once the device has correctly interpreted the actual words of the user’s request, we need to parse intent from those words. What does the user actually want to do? For general purpose platform assistants, this is a daunting task—Siri has to consider the full breadth of human needs when figuring out what a user might be asking.

Thankfully, an in-app assistant only needs to be on the lookout for requests germane to that app’s functionality. Nobody asks their banking app for the weather; there’s a discrete, predictable, and likely single-digit number of tasks to optimize for.

The best way to train an in-app assistant is to analyze your existing customer support data to learn how users phrase these requests. Large financial institutions have a huge leg up here, as they have years and years of recorded customer support calls to draw from. The more data, the more accurate the meaning layer becomes!

3. UX Response

Once the intent is understood, it’s time to deliver a response experience. The importance of investing in this area can’t be overstated; a successful voice experience isn’t merely about returning the correct information but returning it in a consumable, efficient format.

Whether you’re redirecting users to the appropriate existing area in the app or presenting simple visual summaries of balances and other account information, it’s time to treat voice responses as their own UX flow. What are the specific needs of someone choosing to engage with your app via voice, and how do they change your presentation of information? As mentioned above, privacy concerns should play a key role in deciding what kind of response is most appropriate for users.

divider 600

The trends are clear—financial institutions who don’t prioritize recontextualizing their offerings for the next generation of digital users will be left behind. Voice assistant adoption is growing at a faster clip than any major technological breakthrough in modern history (yes, including the smartphone!), so for banks building a strategy for the future, a voice roadmap should be front-and-center.

Discover How to Drive ROI with Voice Experiences

Download the Report

Xcode Cloud: Choosing the Right Continuous Integration and Delivery Tool for Your Team

Apple recently announced the release of Xcode Cloud at the latest iteration of their...

Read the article