Mobile // Mobile Applications
05:39 PM
Thomas Claburn
Thomas Claburn
Connect Directly
Repost This

Google Books Metadata Includes Millions Of Errors

The Google Books database is riddled with errors, millions, of them by Google's count.

The Google Books database is riddled with errors, millions, of them by Google's count.In a blog post ruminating about the impact of the Google Books lawsuit settlement, the subject of much controversy of late, Geoffrey Nunberg, professor at the School of Information at UC Berkeley, wryly highlights the inaccuracy of the metadata used in the Google Books database by noting what a miraculous year 1899 was for literature.

That year, by Google's reckoning, was the publication date for "Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, André Malraux' La Condition Humaine, Stephen King's Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams' Culture and Society, Robert Shelton's biography of Bob Dylan, Fodor's Guide to Nova Scotia, and the Portuguese edition of the book version of Yellow Submarine, to name just a few," Nunberg observes.

1899, it turns out, is a placeholder number. A metadata provider gave Google a large number of book records from Brazil that list 1899 as a default publication date, resulting in about 250,000 misdated books from this one source.

"Our providers have millions of errors like these, and we do what we can to eliminate them," acknowledges Google' engineering manager Jon Orwant in a comment on Nunberg's blog. "We have made substantial improvements over the past year, but I'm sure we can all agree there's a great deal more to do."

Many of those participating in the discussion on Nunberg's blog suggest that some of that "great deal more" could be Wikipedia-style crowdsourcing. It would be a cost-effective way -- free labor! -- to hunt down and correct errors in the Google Book database. But would anyone go for it?

As a commenter identifying himself as Nick Lamb puts it, "Volunteers have transcribed Britain's census (100+ year old census paperwork is released to the public on the basis that most people mentioned in it are long dead) and other public records which are every bit as dull as the phone book. BUT to make it happen Google need to reassure people that they're not being taken advantage of, the facts collected must be irrevocably put into the public domain."

Whether that fits with Google's long-term goals for Google Books remains to be seen. But Google-style crowdsourcing -- Knol -- hasn't exactly given Wikipedia a run for its money.

More broadly, the state of the Google Books metadata suggests that other databases that have an even greater impact on people's lives may also be rife with errors.

If Congress ever gets around to passing comprehensive online data regulation, here's to hoping that it includes a right to review and correct the data that describes who we are and what we do.

Comment  | 
Print  | 
More Insights
Building A Mobile Business Mindset
Building A Mobile Business Mindset
Among 688 respondents, 46% have deployed mobile apps, with an additional 24% planning to in the next year. Soon all apps will look like mobile apps and it's past time for those with no plans to get cracking.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.