At a speech-technology conference in New York on Wednesday, Microsoft introduced a second beta version of its .Net Speech Software development kit, a plug-in to its Visual Studio .Net development-tools suite that aims to get more programmers writing speech apps that use Microsoft tools and servers. The company also released a "technical preview," of its .Net Speech Platform, a set of application-building components. The software is due next year.
Microsoft's efforts could expand markets for wireless phone airtime, speech-recognition software, and PDAs--an area in which Microsoft is picking up market share. Microsoft said Oct. 22 it is buying Vicinity Corp., which makes software that delivers maps and directions to cell phones and PDAs, for $96 million in cash.
"Speech recognition has never really taken off the way people thought it would," says Brian Strachman, an analyst at market-research firm In-Stat. "We need a broad base of applications, and Microsoft has a direct line to all the developers out there."
Microsoft's Office suite includes a dictation-taking capability, and the company published an API for building speech-capable Windows apps. But the company's latest effort focuses on selling call-center software. Airlines and banks often maintain separate sets of application software logic for their Web sites and call centers, which is wasteful, says Kai-Fu Lee, a Microsoft VP who managed the company's Beijing research lab in the late '90s. "The Web is where the real-time, accurate data resides." Microsoft hopes the widespread use of its tools will encourage more companies to write apps that allow access to the same data by phone or over the Web.
"The call center has always been the first, best market for speech recognition," analyst Strachman says, since software can provide a quick payback in saved labor. Telephone customer-service, or "interactive voice response," apps have been characterized by dedicated hardware and application-specific protocols for retrieving data. VoiceXML, a markup language backed by AT&T, IBM, Lucent Technologies, and Motorola, has been a way to reuse logic across voice applications.
Microsoft's approach, called Speech Application Language Tags, adds support for "multimodal" apps that can deliver text and graphics to computer and cell-phone screens, in addition to voice feedback. Microsoft and its partners in the Salt Forum, which include Cisco Systems and Intel, submitted Salt 1 to the World Wide Web Consortium, an industry-standards body, in August. There aren't any multimodal applications on the market today, but they could debut next year as more developers adopt Salt. The new beta version of the .Net Speech development kit--a collection of pre-built components that automate common programming tasks--includes the ability to build a usable server-side application and complies with the version of Salt submitted to the W3C. The .Net Speech Platform includes a Salt-enabled version of the Internet Explorer Web browser, software that communicates with call-center answering and routing hardware, and a speech-recognition engine.
Microsoft plans to give away the development kit but hasn't decided how it will package its speech platform. In addition to license revenue, there's also value in seeding the market with speech software, Lee says. "Speech will be part of the PC experience," Lee says. "But it won't come from the PC, because the keyboard and mouse work so well. We want to decrease the users' fear of the technology."