Commentary

Ed Hansberry
 

Speech to Text Coming To iPhone?

According to a patent filing, Apple is working on speech-to-text technology for its iPhone and iPod product lines. Speech recognition could be the holy grail for data entry and retrieval on mobile devices, especially as they continue to shrink in size.

According to a patent filing, Apple is working on speech-to-text technology for its iPhone and iPod product lines. Speech recognition could be the holy grail for data entry and retrieval on mobile devices, especially as they continue to shrink in size.The Baltimore Sun found the patent and has included a diagram of how the system would work when composing an email.

There is a lot of engineering speak in the filing, but I could decipher a few tidbits of info - that and I've seen this stuff on Star Trek so I know how it is supposed to work. It seems the speech recognition module they are working on would be able to not only handle text but non-speech data as well, such as punctuation.


More Mobility Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

To varying degrees, this has been tried before on mobile devices. The most rudimentary are the voice snippets you can record into your phone for a few of your favorite contacts. One of the better speech tools for phones is by Microsoft and called Voice Command. It is really pattern recognition. You can say "Call Sally Jones at work" and it will search through your contacts and find a name that matches what your digitized voice said and dials the number. You don't have to train it or record her name before you can use it. You can also ask it the time, battery level, signal strength, upcoming appointments and more. It is rather limiting though and there is no way to compose an email with it or tell it to do anything outside of dozen or so tasks it was written for.

I recall one demo by Bill Gates a few years ago where he spoke into a Pocket PC (that is what they were called way back when) and got nearly flawless text recognition out of it, but the trick there was the voice data was converted to digital then sent via wireless to a powerful server which did the heavy lifting. It returned the text to the screen. In the day's GPRS networks, it just wasn't feasible, which is why he was using WiFi. Today with 3G networks, it is more realistic, but you have the issue of who is going to pay for the server to potentially service hundreds of thousands of voices simultaneously?

It seems to me from perusing the patent that the speech recognition module is a separate chip or other such hardware that will be in the device that will be purpose built for this, much like a video card offloads graphics from your compute's main processor. If Apple can pull this off, they will have a huge win on their hands.

I just hope they put an altimeter on it that cuts the module off at 10,000 feet so I don't have to listen to the guy next to me on a cross country flight dictate a research paper into his phone.


Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
T-Shirt Giveaway T-Shirt Giveaway: Each week we're selecting one great comment from our readers. The author of the comment will receive an InformaitonWeek Community t-shirt. So get posting!
Subscribe to RSS

Resource Links