2 min read

Speech Recognition's Next Iteration

Research, emerging technologies, ideas--and the people behind it all
Plus delve arrest her!" That's how a call center equipped with speech-recognition technology might interpret a customer's request to "please deliver a red sweater." That's because systems can understand only precise, clear syntax that bears little resemblance to the way most people speak.

IBM Research is trying to change that. Later this year, it plans to launch the Super Human Speech Recognition Project that aims to solve common speech-recognition problems and deliver systems capable of not just linguistic comprehension but contextual understanding.

The development of software that uses a language model to predict which words are most likely to follow other words is among the numerous approaches the company is taking. IBM Research is also using an acoustic model in which software predicts all the ways a particular word might sound given various pronunciations, cadences, or background interference.

David Nahamoo, a department manager at IBM Research in Yorktown Heights, N.Y., says commercial software applications available today, including IBM's ViaVoice, are starting to incorporate these techniques. As a result, companies in a number of industries are beginning to use voice automation to deal with routine inquiries.

The real challenge, Nahamoo says, is building systems that can understand multifaceted conversations or respond to open-ended questions. To that end, IBM is working on an approach called domain-specific interpretation. Systems designed for use in, say, a travel agency would be programmed to minimize the relevance of conversational elements not related to travel to generate the best response.

But don't expect a talking machine to help you do a better job choosing a Christmas present for Uncle Bob anytime soon. Says Nahamoo, "That's getting into the realm of artificial intelligence, and I don't have a crystal ball for that."