For starters, IBM said SpeechConcept will integrate IBM WebSphere Voice Server (WVS) into its telephony products, allowing callers to react or respond to a query or command by speaking naturally, rather then using formatted sentences or menu commands.
In addition, DTMS Solutions, which develops and hosts speech applications for customer service call centers, and IBM will market IBM WebSphere Voice Server speech recognition products to enterprises. The application will tie customer transactions to related systems through a service-oriented architecture (SOA) platform. "Companies are looking for an open-standard interface," said Brian Garr, program director for enterprise speech at IBM. "The first customers using this method will come on line this quarter."
Enterprises will spend $2.6 billion by 2009 on speech-related software, hardware professional services and related technology, up from $1.2 billion this year, according to Opus Research.
"For the past three years, the makers of core engines for speech recognition have experienced tremendous accuracy in the lab and whether that translates into accurate recognition in a noisy room remains to be seen," said Dan Miller, senior analyst at Opus Research.
It's the routine applications, such as speech-automated attendants, that will build popularity for the technology. For example, every call into Microsoft for the past year goes through the Microsoft speech server that lets the caller dial by name, and with "tremendous accuracy" it connects, Miller said.
But not all Microsoft speech systems have met with success. Microsoft Corp.'s failed demonstration of its speech-recognition tool in Windows Vista malfunctioned in front of an auditorium full of financial analysts.
Microsoft had hoped to impress an influential crowd gathered at the annual Microsoft financial Analysts Conference last month. Instead a glitch in the system turned the demonstration into embarrassment.
The trouble began on the first voice comment from the presenter, a member of the team working on the new version of the company's Windows operating system, analyst said. Rather than typing "Dear mom," as Shanen Boettcher had instructed with a voice command, the computer wrote "Dear aunt."
A video of the uncooperative Vista speech recognition demo was posted on the video sharing site YouTube.
Larry Osterman, a software design engineer at Microsoft, in a blog took responsibility for the bug.
Osterman attributed the problem to a feedback loop. "There's a timing issue that is causing a positive feedback loop that resulted from a signal being fed back into an amplifier," Osterman wrote in his blog.
Desktop computer nirvana comes from the "Star Trek" model where interaction with computers relies on voice commands, said Joe Wilcox, senior analyst at Jupiter Research. "We're a long way from there," he said. "Windows Vista will try to close the gap, but today there are lots of limitations."
The problems Microsoft had is "emblematic of a technology whose ways to implement aren't fully understood," Miller said. It's further along in development than the botched Microsoft demo experience would suggest, he added.