Google Books Metadata Includes Millions Of Errors - InformationWeek
Mobile // Mobile Applications
05:39 PM
Thomas Claburn
Thomas Claburn
Connect Directly

Google Books Metadata Includes Millions Of Errors

The Google Books database is riddled with errors, millions, of them by Google's count.

The Google Books database is riddled with errors, millions, of them by Google's count.In a blog post ruminating about the impact of the Google Books lawsuit settlement, the subject of much controversy of late, Geoffrey Nunberg, professor at the School of Information at UC Berkeley, wryly highlights the inaccuracy of the metadata used in the Google Books database by noting what a miraculous year 1899 was for literature.

That year, by Google's reckoning, was the publication date for "Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, André Malraux' La Condition Humaine, Stephen King's Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams' Culture and Society, Robert Shelton's biography of Bob Dylan, Fodor's Guide to Nova Scotia, and the Portuguese edition of the book version of Yellow Submarine, to name just a few," Nunberg observes.

1899, it turns out, is a placeholder number. A metadata provider gave Google a large number of book records from Brazil that list 1899 as a default publication date, resulting in about 250,000 misdated books from this one source.

"Our providers have millions of errors like these, and we do what we can to eliminate them," acknowledges Google' engineering manager Jon Orwant in a comment on Nunberg's blog. "We have made substantial improvements over the past year, but I'm sure we can all agree there's a great deal more to do."

Many of those participating in the discussion on Nunberg's blog suggest that some of that "great deal more" could be Wikipedia-style crowdsourcing. It would be a cost-effective way -- free labor! -- to hunt down and correct errors in the Google Book database. But would anyone go for it?

As a commenter identifying himself as Nick Lamb puts it, "Volunteers have transcribed Britain's census (100+ year old census paperwork is released to the public on the basis that most people mentioned in it are long dead) and other public records which are every bit as dull as the phone book. BUT to make it happen Google need to reassure people that they're not being taken advantage of, the facts collected must be irrevocably put into the public domain."

Whether that fits with Google's long-term goals for Google Books remains to be seen. But Google-style crowdsourcing -- Knol -- hasn't exactly given Wikipedia a run for its money.

More broadly, the state of the Google Books metadata suggests that other databases that have an even greater impact on people's lives may also be rife with errors.

If Congress ever gets around to passing comprehensive online data regulation, here's to hoping that it includes a right to review and correct the data that describes who we are and what we do.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
Digital Transformation Myths & Truths
Transformation is on every IT organization's to-do list, but effectively transforming IT means a major shift in technology as well as business models and culture. In this IT Trend Report, we examine some of the misconceptions of digital transformation and look at steps you can take to succeed technically and culturally.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll