Mobile // Mobile Applications
Commentary
9/3/2009
05:39 PM
Thomas Claburn
Thomas Claburn
Commentary
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Google Books Metadata Includes Millions Of Errors

The Google Books database is riddled with errors, millions, of them by Google's count.

The Google Books database is riddled with errors, millions, of them by Google's count.In a blog post ruminating about the impact of the Google Books lawsuit settlement, the subject of much controversy of late, Geoffrey Nunberg, professor at the School of Information at UC Berkeley, wryly highlights the inaccuracy of the metadata used in the Google Books database by noting what a miraculous year 1899 was for literature.

That year, by Google's reckoning, was the publication date for "Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, André Malraux' La Condition Humaine, Stephen King's Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams' Culture and Society, Robert Shelton's biography of Bob Dylan, Fodor's Guide to Nova Scotia, and the Portuguese edition of the book version of Yellow Submarine, to name just a few," Nunberg observes.

1899, it turns out, is a placeholder number. A metadata provider gave Google a large number of book records from Brazil that list 1899 as a default publication date, resulting in about 250,000 misdated books from this one source.

"Our providers have millions of errors like these, and we do what we can to eliminate them," acknowledges Google' engineering manager Jon Orwant in a comment on Nunberg's blog. "We have made substantial improvements over the past year, but I'm sure we can all agree there's a great deal more to do."

Many of those participating in the discussion on Nunberg's blog suggest that some of that "great deal more" could be Wikipedia-style crowdsourcing. It would be a cost-effective way -- free labor! -- to hunt down and correct errors in the Google Book database. But would anyone go for it?

As a commenter identifying himself as Nick Lamb puts it, "Volunteers have transcribed Britain's census (100+ year old census paperwork is released to the public on the basis that most people mentioned in it are long dead) and other public records which are every bit as dull as the phone book. BUT to make it happen Google need to reassure people that they're not being taken advantage of, the facts collected must be irrevocably put into the public domain."

Whether that fits with Google's long-term goals for Google Books remains to be seen. But Google-style crowdsourcing -- Knol -- hasn't exactly given Wikipedia a run for its money.

More broadly, the state of the Google Books metadata suggests that other databases that have an even greater impact on people's lives may also be rife with errors.

If Congress ever gets around to passing comprehensive online data regulation, here's to hoping that it includes a right to review and correct the data that describes who we are and what we do.

Comment  | 
Print  | 
More Insights
Building A Mobile Business Mindset
Building A Mobile Business Mindset
Among 688 respondents, 46% have deployed mobile apps, with an additional 24% planning to in the next year. Soon all apps will look like mobile apps – and it's past time for those with no plans to get cracking.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 23, 2014
Intrigued by the concept of a converged infrastructure but worry you lack the expertise to DIY? Dell, HP, IBM, VMware, and other vendors want to help.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.