The InformationWeek -- Blogs

Google

Topics:   Google

  • Email this page E-mail this page
  • Print this page Print this page
  • Bookmark and Share
  • icon

Google Books Metadata Includes Millions Of Errors


Posted by Thomas Claburn, Sep 3, 2009 05:39 PM

The Google Books database is riddled with errors, millions, of them by Google's count.


In a blog post ruminating about the impact of the Google Books lawsuit settlement, the subject of much controversy of late, Geoffrey Nunberg, professor at the School of Information at UC Berkeley, wryly highlights the inaccuracy of the metadata used in the Google Books database by noting what a miraculous year 1899 was for literature.

That year, by Google's reckoning, was the publication date for "Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, André Malraux' La Condition Humaine, Stephen King's Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams' Culture and Society, Robert Shelton's biography of Bob Dylan, Fodor's Guide to Nova Scotia, and the Portuguese edition of the book version of Yellow Submarine, to name just a few," Nunberg observes.

1899, it turns out, is a placeholder number. A metadata provider gave Google a large number of book records from Brazil that list 1899 as a default publication date, resulting in about 250,000 misdated books from this one source.

"Our providers have millions of errors like these, and we do what we can to eliminate them," acknowledges Google' engineering manager Jon Orwant in a comment on Nunberg's blog. "We have made substantial improvements over the past year, but I'm sure we can all agree there's a great deal more to do."

Many of those participating in the discussion on Nunberg's blog suggest that some of that "great deal more" could be Wikipedia-style crowdsourcing. It would be a cost-effective way -- free labor! -- to hunt down and correct errors in the Google Book database. But would anyone go for it?

As a commenter identifying himself as Nick Lamb puts it, "Volunteers have transcribed Britain's census (100+ year old census paperwork is released to the public on the basis that most people mentioned in it are long dead) and other public records which are every bit as dull as the phone book. BUT to make it happen Google need to reassure people that they're not being taken advantage of, the facts collected must be irrevocably put into the public domain."

Whether that fits with Google's long-term goals for Google Books remains to be seen. But Google-style crowdsourcing -- Knol -- hasn't exactly given Wikipedia a run for its money.

More broadly, the state of the Google Books metadata suggests that other databases that have an even greater impact on people's lives may also be rife with errors.

If Congress ever gets around to passing comprehensive online data regulation, here's to hoping that it includes a right to review and correct the data that describes who we are and what we do.

« Mobile Round-Up: Palm Apps, Bluetooth, Ovi Store, N97 | Main | Google To Revamp Android Market »



Sign Up Now
For InformationWeek News Alerts




This is a public forum. United Business Media and its affiliates are not responsible for and do not control what is posted herein. United Business Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of United Business Media LLC and may be edited and republished in print or electronic format as outlined in United Business Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.




 
Sign Up For The Grok on Google Newsletter
Every Thursday, Tom Claburn and his fellow analysts offer all the news, insight, analysis, and strategic thinking you need to understand the company and complex phenomenon known as Google.

Sign up for our free, weekly newsletter today!

Newsletter Archives


  :: THE LATEST GOOGLE NEWS ::



 

  1. Just Say No To SFAQL Parallelism
  2. QuickThread: A New C++ Multicore Library
  3. Speeding Up Code Without Doing Anything


Join The InformationWeek Group On LinkedIn


                           


  1. Thoughts On The Motorola Droid
  2. Repurposing Quack Science
  3. Specs For Next Motorola Android Phone Leak
  4. Motorola Promises Fix For Droid's Goofy Camera


  1. Cisco Rolls Out iPhone Security App
  2. Review: Bluetooth Headsets For Mobile Pros
  3. Wolfe's Den: Intel CTO Envisions On-Chip Data Centers
  4. So Much Data, So Little Encryption
  5. Lessons Learned From PCI Compliance
  6. Practical Analysis: How Locked In To Vendors Are You?

 

  Ars Technica
Boing Boing
Channel 9 Forums
CRN Blogs
Dr.Dobb's Portal: Blogs
Engadget
Gizmodo
GrokLaw
  Lifehacker
Schneier on Security
Slashdot
TechCrunch
Techdirt
Techmeme
Valleywag

  DECEMBER 2008
NOVEMBER 2008
OCTOBER 2008
SEPTEMBER 2008
AUGUST 2008
JULY 2008
JUNE 2008
MAY 2008
  APRIL 2008
MARCH 2008
FEBRUARY 2008
JANUARY 2008
DECEMBER 2007
NOVEMBER 2007
OCTOBER 2007
SEPTEMBER 2007