Voice: It's The New UI

Windows 7 does speech recognition one better.
Voice has yet to gain a foothold with developers, but Windows 7 is about to change that. Exchange Server 2010 includes Voice Mail Preview, so the push is on to include voice and speech awareness and interaction in applications.

"Voice is the new touch," says Zig Serafin, GM of Microsoft's speech group. "It's the natural evolution from keyboards and touch screens."

Speech-aware apps recognize human speech and react to commands. They talk back to users instead of displaying text, letting people interact with their computers in the same way they interact with other people.

These apps have two components:

  • Speech recognition to convert spoken words and sentences to text. Windows 7 lets users train the speech-recognition system to transform it into a voice-recognition system. This way, the speech-recognition engine improves its accuracy based on the user's unique vocal sounds.
  • Speech synthesis to artificially produce human speech and talk to users. Windows 7's Text-To-Speech (TTS) engine converts text in a specific language into speech.

The Speech Recognition Control Panel application, found in the Ease of Access category, lets you configure the microphone and train your computer to understand you. It also has a TTS tab in the Speech Properties dialog box where you can pick and configure the default voice for the TTS engine.

Talking To People

To get an app to talk to users, you use speech synthesis services, wrappers provided by both .NET Framework 3.5 and .NET Framework 4 (Release Candidate). First, add the System.Speech .dll assembly as a reference to an existing C# project, then include the System.Speech.Synthesis namespace to access the classes, types, and enumerations offered by the speech synthesis wrapper. You can create a new instance of the SpeechSynthesizer class and call its Speak method with the text to speak: using System.Speech.Synthesis;. This way, the TTS engine uses the default voice, its parameter values, and audio output to turn the received text into human speech:

var synthesizer = new Speech Synthesizer();
synthesizer.Speak ("Hello! How are you?");

The statement after the call to the Speak method isn't executed until the TTS engine finishes saying "Hello! How are you?" To create a more responsive speech-aware application, call the SpeakAsync method, which produces the same effect as Speak but continues to the next statement after scheduling an asynchronous operation to transform the received text to speech:

synthesizer.SpeakAsync("Good morning!");

Configuring Voices

The SpeechSynthesizer class provides methods to perform GetInstalledVoices to retrieve the installed voices; SelectVoice to specify the voice to use by its name; and SelectVoicyByHints to specify the voice to use by hints.

The GetInstalledVoices method returns a read-only collection of InstalledVoice instances. You can access the VoiceInfo.Name property for each element and use it as a parameter to the SelectVoice method. It's possible to fill a list box or a combo box with these values and let the user select the desired voice in your speech-aware app. These two lines retrieve the installed voices according to the UI culture, then select the first voice:

var installedVoices = synthesizer.Get InstalledVoices(System.Globalization. CultureInfo.CurrentUICulture);

In a default Windows 7 English (United States) installation, the value for the VoiceInfo.Name property is "Microsoft Anna."

Processing Voice User Commands

To process user voice commands, use speech-recognition services that are part of the System.Speech.dll assembly. If you include the System.Speech.Recognition namespace, you'll be able to access the classes, types, and enumerations offered by the speech-recognition wrapper: using System.Speech.Recognition;.

Speech-recognition engines are complex and require dozens of parameters. This example simplifies recognition, using a limited list of user commands. You have to create a new instance of the SpeechRecognitionEngine class, accessible to many methods and events that will interact with it: private Speech RecognitionEngine _recognitionEngine = new SpeechRecognitionEngine();. You can then define a list of alternative items to make up an element in a grammar. This list is a Choices instance, which creates a new GrammarBuilder and then a Grammar that's loaded to the engine.

The following code defines five possible voice commands:

string[] voiceCommands = new string[]
 "Favorite news",
 "Favorite movies",
 "Weather forecast",
 "New blog entry",
 "New word document"
var comChoices = new Choices(voiceCommands);
var comGrammarBuilder = new GrammarBuilder(comChoices);
var comGrammar = new Grammar(com GrammarBuilder);
_recognitionEngine.LoadGrammar(com Grammar);

You can also use XML elements defined by the Speech Recognition Grammar Specification to create grammars that can accept more complex commands.

At this point, you're ready to add event handlers to the events that the recognition engine is going to fire:

The user began talking.

The recognition engine recognized one of the voice commands. Check the results by adding code in an event handler attached to this event.

The user began talking but recognition engine didn't understand the voice command.

The recognition engine finished its asynchronous execution. The code written in an event handler attached to this event will be executed after calling the event handler for either SpeechRecognized or RecognizeRejected.

Speech may be a natural evolution from keyboards and touch screens, but the APIs required to work with speech-related services are complex. Windows 7 gives developers the ability to create speech-aware apps through performance and accuracy improvements to the speech-recognition engine. Once you begin working with speech-aware apps, you'll find great opportunities to take advantage of this natural user interface.

Gaston Hillar is an IT consultant and author of more than 40 books on topics ranging from systems programming to IT project management.

Editor's Choice
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing
John Edwards, Technology Journalist & Author
Shane Snider, Senior Writer, InformationWeek
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing
Brandon Taylor, Digital Editorial Program Manager
Jessica Davis, Senior Editor
John Edwards, Technology Journalist & Author