Speech Recognition Finds Its Voice - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // AI/Machine Learning
Commentary
1/19/2018
07:00 AM
Ali Iranpour, Director of Mobile Strategy, Arm
Ali Iranpour, Director of Mobile Strategy, Arm
Commentary
50%
50%

Speech Recognition Finds Its Voice

Voice recognition technology seems to be taking over the world, but the concept still has a ways to go before it reaches full maturity.

The world changed profoundly when our interaction with the digital universe shifted from keystrokes to voice commands: “Hey, Siri,” “Hello Alexa,” “OK, Google” all quickly unlock doors into information and services in ways we couldn’t have imagined just five years ago.

Consumers have embraced speech-recognition technology:

  • More than 60% of respondents use speech recognition technology when their hands are occupied, according to the 2016 KPCB Internet Trends Report.
  • By 2018, 30% of all interactions with devices will use speech recognition, according to research company Gartner.
  • Amazon’s Alexa-powered Echo products are among its most-popular sellers.

The transformation driver is simple: People can speak up to four times faster than they can type. However, while the technology works because it’s more natural, speech recognition systems are still in their infancy. Significant challenges lie ahead if we’re going to continue to make this user interface the center of our daily digital lives.

Can you hear me now?

Consider the complexity of language. Native English-speaking adults understand an average of 22,000 to 32,000 vocabulary words and learn about one word a day, according to an American-Brazilian research project. Non-native English-speaking adults know an average range of 11,000 to 22,000 English words and learn about 2.5 words a day.

Approximately 170,000 words are used regularly by native speakers and the entire English language contains more than 1 million words, with 8,500 new words added each year. Yet most contemporary embedded speech-recognition systems use a vocabulary of fewer than 10,000 words. Accents and dialects increase the vocabulary size needed for a recognition system to be able to correctly capture and process a range of speaker variability within a single language. The gulf between technology capability and requirements is therefore vast.

Despite the gap, technology innovation hasn’t slept since the first voice-recognition technologies — IBM’s Shoebox machine and Bell Labs’ Audrey device — were introduced more than a half-century ago.

With the continually improving computing power and compact size of mobile processors, large vocabulary engines that promote the use of natural speech are being built into OEM devices. Not to mention, the adoption rate is picking up steam as the footprint for such an engine has been shrunk and optimized.

However, more needs to be done.

Effective speaker-recognition requires the segmentation of the audio stream, detection and/or tracking of speakers, and identification of those speakers. The recognition engine fuses the result to make decisions more readily. For the engine to function at its full potential and to allow users to speak naturally and be understood—even in a noisy environment like a train station or airport—pre-processing techniques must be integrated to improve the quality of the audio input to the recognition system.

A new approach to recognition systems

The other key to improving voice recognition technology is distributed computing. Often today, voice inputs in edge and mobile devices are computed in the cloud and the results whisked back to the users. However, there are limitations to using cloud technology when it comes to its application in a real-time environment that requires privacy, security, and reliable connectivity. The world is moving quickly to a new model of collaborative embedded-cloud operation—called an embedded glue layer—that promotes uninterrupted connectivity and addresses emerging cloud challenges for the enterprise.

With an embedded glue layer, capturing and processing user’s voice or visual data can be performed locally and without dependence on the cloud. In its simplest form, the glue layer acts as an embedded service and collaborates with the cloud-based service to provide native on-device processing. The glue layer allows for mission-critical voice tasks—where user or enterprise security, privacy and protection are required—to be processed natively on the device as well as ensuring continuous availability.

Non-mission critical tasks, such as natural language processing, can be processed in the cloud using low-bandwidth, textual data as the mode of bilateral transmission. The embedded recognition glue layer provides nearly the same level of scope as a cloud-based service, albeit as a native process. And it tightens voice security in ways similar to how fingerprint-recognition technology is stored on local devices rather than in the cloud.

This approach to voice-recognition technology will not only revolutionize applications and devices, it will continue to fundamentally alter how we interact with the digital world in a safer, more secure, more productive manner. 

Ali Iranpour is the Director of Mobile Strategy, Client Line of Business at Arm. He’s been with Arm since 2014 and is primarily responsible for developing the strategy around the mobile and wearables markets.  Ali graduated from Lund University with a PhD, and prior to joining Arm worked for Sony Mobile and Ericsson.

 

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
Commentary
If DevOps Is So Awesome, Why Is Your Initiative Failing?
Guest Commentary, Guest Commentary,  12/2/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll