At the start of the summer, like millions of other folks, I watched the Google Duplex demo, that sounded was an eerily “real” sounding exchange between a voicebot and an actual human. After which, three words sprang to mind: Voice is dead.
That’s the end, I thought. Prepare for hordes of new humanoid sales bots flooding the phone lines, pitching political candidates and deals on cruise vacations. Say goodbye to those canned “your call is important to us” greetings and hello to an explosion of uptalking, filler-word-fueled vocal replicants designed to elicit empathy and trick you into courtesy toward indifferent machines. Pretty soon, people will need a Shibboleth to identify fellow humans because there will be a bot that can mimic my voice well enough to call Mom on Mother’s Day and conduct a conversation with all the right “uh-huhs” and “mmm-hmms” to embezzle effortless good son points.
Yep. Voice is dead. Except it isn’t.
Ever since Apple’s Siri wowed the world with its debut on the iPhone 4S, there’s been an evolutionary surge in “smart” voice technology — and an equivalent surge in people embracing it. In just a few short years, voice interaction with “things” has become pretty commonplace. It turns out that we actually like talking to bots. Witness the tens of millions in smart speaker sales and how familiar 1-in-5 Americans are with uttering “Alexa, what time is it?” or “Hey Google, tell me a joke.”
There’s good reason. Speech-recognition and natural-language processing (NLP) systems, combined with enormous troves of user data have actually made complicated technology more amenable to human use. It is infinitely easier to just ask for something rather than having to read through pages of text, master some new piece of software, and search or scroll or type or tap. In a sense, sophisticated voice technology forces our tools to give us what we want, the way we want to get it. Most of the time, voice is just more convenient.
Because voice is more convenient, we’re already starting to see AI-assisted human-like voice technology hit the enterprise. But that doesn’t mean there is no danger.
To fear or not to fear, the bot
All of my initial fears about voicebots may yet come to pass. Even though the Google Duplex bot presented at the Google I/O event in May is designed to operate only for limited uses like making restaurant reservations, and even though Google says it is not testing the technology with any enterprise clients right now, and even though the company promises the bot identifies itself as AI during calls — this kind of human-sounding technology is already wending its way into business at large and into all of our lives. Proliferation is inevitable as organizations look to automate and cut costs. Resistance is futile, sayeth The Borg.
Microsoft’s social chatbot Xiaoice has already made a million phone calls with users in China, can predict what people will say next during a conversation, and will even interrupt someone mid-sentence just like humans often do. And Amazon introduced Lex to bring Alexa-like capabilities to enterprise call centers.
This is all well and good if utility remains the focus of voicebot deployment, and everyone stays alert to the potential for abuse. People will develop a tolerance for voicebots in the enterprise just as we’ve developed a fondness for them in our homes, particularly if they're effective and available and responsive. No one likes calling a bank and waiting five minutes to speak to an agent, only to learn they are still ill-equipped to deal with your inquiry. If voicebots provide value, then we will absolutely swarm to that. It goes back to convenience.
But whose convenience? It has to be equally convenient for the business and for the consumer. And there are troves of underlying social, moral, and legal implications to consider as this technology matures in support of that balance. How do you ensure your voicebot behaves ethically? How do you prevent things like inflection and sentiment analysis being deployed to manipulate people during bot conversation? What do bots do with the information I provide them? Are they remembering my credit card number? Dining preferences? When and where I like to get my hair cut? Where does such information go? How is it stored? Who and what else can access it?
The weight of user expectations
Dealing with the data required to power these human-like voicebots is an enormous responsibility. Organizations must know where their data is coming from and how it is fused. Is it mixed with third-party data? Where is that data coming from, and can I use that to build my AI models, and should I? If the social media data usage scandals of recent months have taught us anything, it is that companies have not always considered the negative implications of popular new technologies and how they might be misused.
Businesses deploying voicebots also had better be prepared for the hefty weight of user expectation. As humans, we are biologically wired to recognize voices and instinctively recall what we can do with the associated persona. If your company’s voicebot sounds like Alexa, I’m going to expect it to “act” like Alexa. When there are variations in business deployment and experience, it will quickly lead to consumer annoyance. Why can I talk to this brand and do these five things, but only do half as much when I talk to that sound-alike brand?
Even though voicebots are soon going to be everywhere, we are not going to be comfortable talking to them all the time. They will have utility, and they will be more convenient for quick queries and simple tasks. But, being able to talk with a real human about messy human matters has a value that a bot can never be trained to manage completely and infallibly. Companies will still need human representatives, perhaps now more than ever. The complexity of human discourse, the authentic feeling involved and our unique ability to share the experience of it — this is a type of voice-interaction that cannot be completely and satisfactorily synthesized and automated.
For all the amazingly “real” verbal tics and speedy information processing AI can now produce, voicebots are still an awkward young technology, making human-to-human contact all that more important when we want it. Sometimes, you call your mother just because you want to talk to her. No bot can take her place.
Christopher Connolly is the vice president of product marketing for Genesys. He is based in the company’s regional offices in the Raleigh-Durham area and can be reached at [email protected] or on Twitter at @ConnotronNY.