If you've been eagerly anticipating the day when you can speak into your replicator, like Jean-Luc Picard, and request "Tea, Earl Grey, hot," don't start stocking up on crumpets yet.
Despite years of hype, speech-recognition technology--which enables devices to interpret spoken language and perform related tasks--hasn't fundamentally changed our daily lives. In fact, it's not even on the verge of letting us seamlessly interface with our PCs, much less any other household devices.
"It's a solution looking for a problem," says Carl Howe, principal analyst with Forrester Research. "While everybody thinks it would be better to be able to talk to your computer, that's not necessarily true."
A daunting combination of obstacles--insufficient processing power, cultural issues, and the inherent complexities of language--continues to hold up development of easy-to-use, client-based speech applications. Howe says the cultural issues may represent the biggest hurdle. Imagine trying to work in an office filled with people talking aloud to their PCs, and you start to get the picture.
Can We Talk?
None of this is to say there isn't software out there that lets users get a modicum of results from talking to their PCs. Just ask Marty Markoe, a longtime speech-recognition consultant, who uses a combination of IBM's ViaVoice, Lernout & Hauspie Speech Products N.V.'s (L&H) Voice Xpress Professional, and Dragon Systems' NaturallySpeaking (also owned by L&H) to perform a variety of PC-based tasks, most notably dictation.
But even Markoe, an advanced user of speech-recognition technology, points out that despite the quantum leaps in computing power in recent years, a lot more speed and memory are needed to process all the vagaries of speech. Only then, he says, will speech-recognition software developers be able to offer something truly useful to mainstream users.
"They hit a roadblock, and there's not much left you can do with speech recognition in terms of continuous speech for dictation to a PC," Markoe says. As a result, for now, use of PC-based speech recognition is characterized by limited recognition and intensive "training" of a PC to recognize the user's speech tendencies.
Complicating matters, Howe says, has been L&H's well-documented fall from grace. Wracked by a financial scandal that led to the incarceration of its founders on charges of fraud and stock manipulation, the once revered language-technology firm is in bankruptcy proceedings and is attempting to sell its core business piece by piece. That's why Howe sees IBM's ViaVoice as the only credible PC-based voice-recognition product on the market.
L&H's struggles also speak to how IBM has managed to build a worldwide base of 12 million users of its PC-based speech products, including ViaVoice. But Nigel Beck, IBM's director of voice systems, acknowledges that PC apps aren't where current and near-term demand is centered. "We expect that server-based voice will be very important and the driver over the next couple of years," Beck says.
Speech recognition is part of a larger category known as voice recognition, which includes the interpretation of simple spoken commands as well as text-to-speech conversion capabilities. And it is server-based voice-recognition services, delivered via telephones, that have made the most significant inroads.
To date, the market for server-based voice recognition apps has had two dominant suppliers, SpeechWorks International Inc. and Nuance Communications. Both companies' technologies are being used to provide access to flight arrival and departure information, online brokerage accounts, phone-based Web browsing, and package-tracking services. These apps manifest themselves in a variety of ways, from rudimentary "press or say one" interactive-voice-response systems to United Airlines' eerily personable flight arrival and departure application. (One customer reportedly wrote to United, saying that by the time she'd finished interacting with the soothing male voice, she wanted to date him.)
The development of such applications should accelerate. Recent research from Allied Business Intelligence indicates that the market for voice-recognition applications will balloon from $2.3 billion this year to $50 billion by 2005. Further, Cahner's In-Stat Group expects sales of speech engines--the server software on which those applications run--to reach $2.7 billion by 2005. (See Boom Predicted For Speech-Recognition Software Market.)
Speech Meets XML
Much of that anticipated growth will be fueled by widespread adoption of VoiceXML, a standard that's doing for voice-recognition technology what HTML did for Web development. Supported by more than 500 companies, including AT&T, IBM, Lucent, and Motorola, VoiceXML essentially makes it possible to mix and match voice products without doing any programming. That means companies buying those products no longer have to commit to a single vendor. It also is making it easier for developers to create new voice-recognition products.
Over the next several years, experts predict VoiceXML to result in a flood of voice-recognition applications and more partnerships between voice-recognition vendors and companies that make interactive-voice-response systems. It could even lead to development of open source applications.
Yet, despite the momentum that VoiceXML is building, voice-recognition technology has more than its share of skeptics in the business world.
Take iNetNow, for instance. One of the growing number of voice portals that provide remote Web browsing to mobile phone users and those without computer access, iNetNow has been somewhat of a maverick. While its competitors have made the voice portal arena one of the hottest areas for voice-recognition technology, iNetNow has built itself around a call center staffed with professional Web surfers, and it has no intention of shifting to an automated offering.
Michael Evans, iNetNow's VP of marketing and business development, says the decision to remain a human-powered voice portal is simple: Company execs simply don't think voice recognition will enable iNetNow to provide the most convenient service possible.
"If we could do that more efficiently through voice recognition, we certainly would," says Evans. "The key point is that there isn't enough problem-solving ability in a voice-recognition application."
Ironically, a new revenue source has emerged from iNetNow's human-powered service: failover service for companies that provide voice-recognition services. In other words, if a company's voice-recognition application fails, iNetNow will jump in and provide a human substitute. The company is doing this for a single, unidentified client, but Evans expects failover service to become a significant portion of iNetNow's business over the next 18 months.
In Your Interface
Some companies, meanwhile, have gotten exactly the results they were looking for when they chose to implement voice-recognition applications. United, for instance, has been elated with the impact of its SpeechWorks-powered flight arrival and departure application, which handles about 70,000 calls a day.
Bill McIntyre, United's manager of new technologies and distribution planning, says that with the previous touch-tone system, 30% of callers bailed out before getting the information they were looking for, opting to speak with a live representative instead. Since it deployed its voice-recognition app in 1999, says McIntyre, that figure is down to less than 10%. "It's gone further than our expectations," he says. "We never dreamed that customers would call us up and tell us how much they love it."
Stuart Patterson, CEO of SpeechWorks, says airlines have been among the earliest adopters of voice-recognition technology, along with online investment brokerages, telcos, and logistics companies. Bringing up the rear, Patterson says, have been the banking, insurance, entertainment, and health-care industries, as well as government. He expects telematics to be the next big growth area, as a growing number of states require voice-activated controls in vehicles to improve highway safety.
For the time being, Patterson says, SpeechWorks is still aggressively educating the market about the potential benefits of voice recognition. He says that since businesses still aren't uniformly sold on the efficiencies that voice recognition can bring, it's unrealistic to expect client-based applications to take off. "As with so many technologies, business will bring it to the consumer," says Patterson.
But Meta Group's Perkins says it really comes down to interfaces. Over the next few years, he says, the accuracy and scalability of voice-recognition technology will improve, as will the ability to recognize natural language, or the way people speak in normal conversation. However, all the improvements in the world won't matter, he says, if a truly easy-to-use interface isn't developed.
That means figuring out how to design a voice-driven interface with the Web for consumer applications, and, on the business side, making voice-recognition applications that can talk to, say, a SAP financial database. "The first one that gets the right combo wins the game," Perkins says.
Until that time, he says, any talk of seamless speech interaction between humans and technological devices is just that--talk. Says Perkins, "It's not going to be Star Trek for another four or five years, if ever."