The InformationWeek -- Blogs

Google

Topics:   Google

  • Email this page E-mail this page
  • Print this page Print this page
  • Bookmark and Share
  • icon

Google Revisits Crowdsourcing With reCAPTCHA Acquisition


Posted by Thomas Claburn, Sep 16, 2009 04:38 PM

In its second acquisition this year, Google has bought reCAPTCHA, a company that provides CAPTCHA images as a barrier to online fraud.


The term CAPTCHA is an acronym for the phrase "Completely Automated Public Turing test to tell Computers and Humans Apart."

Luis von Ahn, co-founder of reCAPTCHA, was among the computer scientists from Carnegie Mellon and IBM who coined the term back in 2000.

What makes reCAPTCHA interesting for Google is that it kills two birds with one stone. Not only are CAPTCHAs necessary for online security -- to prevent spammers from using scripts to automatically register thousands of Gmail accounts, for example -- but reCAPTCHA's unique technology is designed to cull the phrases it presents to users from scanned books.

Because of this, Google will be able to improve the accuracy of the optical character recognition (OCR) applied to book scans through what amounts to "crowdsourced" copy editing. In so doing, Google is again finding value in aggregated intelligence: The highly relevant search results that made Google's name owe a lot to the PageRank algorithm developed by company co-founders Sergey Brin and Larry Page. PageRank weighs links between Web pages as if they were votes for relevance, thereby leveraging the judgment of the crowd to determine which Web sites matter.

"reCAPTCHA's unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition," explain von Ahn, co-founder of reCAPTCHA, and Will Cathcart, a Google product manager, in a blog post. "This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process."

Google has no shortage of errors to correct. One of the company's Book Search engineers recently acknowledged that there are millions of errors in the metadata used to describe the books scanned for Google Book Search. No doubt the company's OCR output isn't perfect either.

But such problems look a lot less daunting when one can leverage CAPTCHA input to correct errors.

« Whatever Happened To The Idea Of Instant On? | Main | Health IT Is Part Of Senate Committee's New Healthcare Reform Bill »



Sign Up Now
For InformationWeek News Alerts




This is a public forum. United Business Media and its affiliates are not responsible for and do not control what is posted herein. United Business Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of United Business Media LLC and may be edited and republished in print or electronic format as outlined in United Business Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.




 
Sign Up For The Grok on Google Newsletter
Every Thursday, Tom Claburn and his fellow analysts offer all the news, insight, analysis, and strategic thinking you need to understand the company and complex phenomenon known as Google.

Sign up for our free, weekly newsletter today!

Newsletter Archives


  :: THE LATEST GOOGLE NEWS ::



 

  1. Detecting Scalability Problems With Intel Parallel Universe Portal
  2. Just Say No To SFAQL Parallelism
  3. QuickThread: A New C++ Multicore Library


Join The InformationWeek Group On LinkedIn


                           


  1. Thoughts On The Motorola Droid
  2. Specs For Next Motorola Android Phone Leak
  3. Encryption Is Cloud Computing Security Savior


  1. Microsoft Bing Cashback Not Always A Bargain
  2. Google Buys Ad Start-Up Teracent
  3. Feds Launch Health IT Blog
  4. Full Nelson: Video: San Francisco Goes Open, Transparent
  5. AOL Previews Brand, Trims Workforce
  6. Physicians Question Health IT Stimulus Requirements

 

  Ars Technica
Boing Boing
Channel 9 Forums
CRN Blogs
Dr.Dobb's Portal: Blogs
Engadget
Gizmodo
GrokLaw
  Lifehacker
Schneier on Security
Slashdot
TechCrunch
Techdirt
Techmeme
Valleywag

  DECEMBER 2008
NOVEMBER 2008
OCTOBER 2008
SEPTEMBER 2008
AUGUST 2008
JULY 2008
JUNE 2008
MAY 2008
  APRIL 2008
MARCH 2008
FEBRUARY 2008
JANUARY 2008
DECEMBER 2007
NOVEMBER 2007
OCTOBER 2007
SEPTEMBER 2007