Hot on the heels of Microsoft's Project Oxford, Google is bringing its Cloud Vision API into beta. The offering allows software developers to create new ways of reading faces and emotions to help push the limits of what can be done with AI and machine learning.

Larry Loeb, Blogger, Informationweek

February 23, 2016

4 Min Read
<p align="left">(Image: Google)</p>

10 AI App Dev Tips And Tricks For Enterprises

10 AI App Dev Tips And Tricks For Enterprises


10 AI App Dev Tips And Tricks For Enterprises (Click image for larger view and slideshow.)

Earlier this month, Google moved its Cloud Vision API out of limited release into open beta. The tool will enable developers to create apps that can parse the emotional content contained in a photo or image. The API also offers a window into how Google views the future of artificial intelligence and machine learning.

This effort comes at a time when other companies, notably Microsoft, are doing the same.

There's also a business model here. When announcing the API, Google detailed its pricing scheme for the offering, which kicks in March 1. During the beta timeframe, each user will have a quota of 20 million images per month. Label detection on an image will cost $2 per 1,000 images. Optical character recognition (OCR) comes in at $.60 for 1,000 images.

However, Google notes: "Cloud Vision API is not intended for real-time mission critical applications." Instead, Google is offering the API to developers to push the limits of AI and machine learning. The company can watch all these developments and learn as it goes.

The Vision API is designed to analyze images that are stored in Google's Cloud Storage, and return characteristics of it. But, the API is not limited only to the GCS platform. A Google representative told InformationWeek in an email that, "Users can integrate the REST API to upload their images in other environments than Google Cloud Storage."

Saying that it is powered by the same technologies behind Google Photos, Google said that Cloud Vision API will, for example, identify broad sets of objects in images -- from flowers to popular landmarks.

Also, Google is leveraging the same Safe Search that it uses to filter "inappropriate content" from its Web-based image search results to filter images submitted via the API.

One of the limited release users, Photofy, noted that the API can flag potentially violent and adult content on user created photos in line with their abuse policies. Photofy CTO Chris Keenan also noted in the same statement that protecting these branded photos from abuse was almost impossible before the Cloud Vision API.

Cloud Vision API can also analyze emotional attributes of people in images, finding joy, sorrow, and anger,  and detecting popular product logos, according to Google. However, it must be remembered that the API works only on static images and not on video. (The same Google representative also confirmed this to InformationWeek.)

This image-only restriction makes the Google effort similar to what Microsoft has announced in Project Oxford.

Oxford also has an API interface of specific tools for deriving emotional states from static images.

"The emotion tool released [in November 2015] can be used to create systems that recognize eight core emotional states -- anger, contempt, fear, disgust, happiness, neutral, sadness, or surprise -- based on universal facial expressions that reflect those feelings," according to Microsoft. It returns those eight states as text labels above parts of the images.

However, the Google API is more nuanced than Microsoft's. It returns a sentiment that is not limited to eight states and has a likelihood correlation appended to it. For example, it might return "joyLikelihood: VERY_LIKELY," rather than simply "Happy."

Microsoft also admits that emotions such as happiness can be detected in Oxford with a higher level of confidence than other emotions such as contempt or disgust. This may be because the use of static images may only reveal part of an emotional state.

[Read about Apple's acquisition of emotion recognition specialist Emotient.]

While Microsoft seems to have integrated the API into its Cortana offering, Google touts its offering as a standalone.

Google has also allowed images to be located anywhere in a cloud, while Microsoft wants the images it analyzes to be in its Azure cloud.

Both efforts are similar in what they aim to do, but vary in the specifics. Google seems to have developed a more general tool, compared to Microsoft's, which integrates with Microsoft offerings, but not much else.

Does your company offer the most rewarding place to work in IT? Do you know of an organization that stands out from the pack when it comes to how IT workers are treated? Make your voice heard. Submit your entry now for InformationWeek's People's Choice Award. Full details and a submission form can be found here.

About the Author(s)

Larry Loeb

Blogger, Informationweek

Larry Loeb has written for many of the last century's major "dead tree" computer magazines, having been, among other things, a consulting editor for BYTE magazine and senior editor for the launch of WebWeek. He has written a book on the Secure Electronic Transaction Internet protocol. His latest book has the commercially obligatory title of Hack Proofing XML. He's been online since uucp "bang" addressing (where the world existed relative to !decvax), serving as editor of the Macintosh Exchange on BIX and the VARBusiness Exchange. His first Mac had 128 KB of memory, which was a big step up from his first 1130, which had 4 KB, as did his first 1401. You can e-mail him at [email protected].

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights