The filing suggests that the privacy issues raised by Google Maps Street View will only get more complicated, that YouTube searchers may one day be able to conduct keyword searches for text captured on video, and that Google searches may one day return a list of products on local store shelves.
"Recognizing Text In Images" is an application to patent a method of optical character recognition in digital images.
"Digital images can include a wide variety of content," the patent application explains. "For example, digital images can illustrate landscapes, people, urban scenes, and other objects. Digital images often include text. Digital images can be captured, for example, using cameras or digital video recorders. Image text (i.e., text in an image) typically includes text of varying size, orientation, and typeface. Text in a digital image derived, for example, from an urban scene (e.g., a city street scene) often provides information about the displayed scene or location. A typical street scene includes, for example, text as part of street signs, building names, address numbers, and window signs."
The image database behind Google Maps Street View happens to contain many street scenes of with this sort of text. Being able to query for store names captured in Street View photos might be a useful way to conduct local searches.
A spokesperson for Google said in an e-mail, "...[W]e file patent applications on a variety of ideas that our employees come up with. Some of those ideas later mature into real products or services; some don't. Prospective product announcements should not necessarily be inferred from our patent applications."
It is however worth noting the backgrounds of Luc Vincent and Adrian Ulges, the two computer scientists behind the patent applications. Vincent describes himself on his Web site as "[l]eader of several large geo-related projects, including Street View", as being "[r]esponsible for various engineering aspects of Google Book Search," and as the "[h]ead of Google OCR-related initiatives." Ulges on his Web site notes his involvement in helping to develop "a system that autonomously learns to tag videos with high-level semantic concepts by watching videos from online portals like http://youtube.com."
The patent application envisions several possible advantages arising from the technology. "Candidate text regions within images can be enhanced to improve text recognition accuracy," the patent application states. "Extracted image text can also be used to improve image searching. The extracted text can be stored as associated with the particular image for use in generating search results in an image search. Additionally, the extracted image text can be combined with location data and indexed to improve and enhance location-based searching. The extracted text can provide keywords for identifying particular locations and presenting images of the identified locations to a user."
Google of course contemplates using the technology to add value to its search advertising business. "For example, a user enters a search for a McDonald's in a particular city or near a particular address," the patent application suggests. "The mapping application generates a map to the McDonald's as well as presents an image of the McDonald's. The McDonald's image is retrieved using the indexed text from the image identifying the McDonald's and location information associated with the image, which identifies the location of the particular McDonald's in the image."
But Google also imagines novel uses. Just as Google created Street View by having camera-equipped vehicles drive through urban areas to capture a series of images of the trip, the search giant imagines cruising the aisles of supermarkets with camera-equipped robots to create what might be called Google Product View. "In one implementation, a store (e.g., a grocery store or hardware store) is indexed," the patent application explains. "Images of items within the store are captured, for example, using a small motorized vehicle or robot. The aisles of the store are traversed and images of products are captured in a similar manner as discussed above. Additionally, as discussed above, location information is associated with each image. Text is extracted from the product images. In particular, extracted text can be filtered using a product name database in order to focus character recognition results on product names."
Such possibilities of course remain highly speculative, and may, like the non-existent but presumably longed-for Google Perp Locator or Google Babe/Stud Finder, never come to pass.