Google, Microsoft, Yahoo Back Search Metadata Project
Schema.org aims to provide a common vocabulary for structuring web page data.
While Google competes with Microsoft and Yahoo in the search market, the three companies are cooperating to help web publishers make their content more comprehensible to search engines.
Google on Thursday said that the three companies had launched an initiative called schema.org, to create and support common ways to represent web page metadata. The project will offer web publishers the tools to make their web content more easily understood by search engines and more effectively represented on search results pages.
"With schema.org, site owners can improve how their sites appear in search results not only on Google, but on Bing, Yahoo, and potentially other search engines as well in the future," said Google Fellow Ramanathan Guha in a blog post.
Schema.org hosts definitions for HTML tags that webmasters can use for data markup. For example, the Person schema provides a way to associate a person's name with data that relates to that person, like his or her street address and email address. Without the structure provided by metadata markup, it can be difficult for search engines to be certain that a name on a web page is associated with some other data attribute.
There are other ways of marking up web pages, such as RDAa and microformats. But Google, Microsoft, and Yahoo argue that other formats have disadvantages and that webmasters will benefit from having a single markup resource focused on search engines, which in turn will lead to more markup and a better search experience.
Google has been pursuing its own structured markup for several years. In 2009, the company enhanced its search results with rich snippets, which made additional data like online reviews visible in search listings. The company has expanded its snippets to include events and recipes. As a result, companies like stubhub.com and allrecipies.com have chosen to structure their data to take advantage of the more effective presentation afforded well-described data.
The schema.org initiative is similar in some respects to sitemaps.org, an XML schema that helps search engine crawlers navigate websites. The protocol was created by Google in 2005 and supported by Microsoft and Yahoo in 2006, with other companies announcing support later.
The existence of schema.org can be seen as an acknowledgement of the limits of automated data analysis. One of the Frequently Asked Questions posted on the schema.org site attempts to deal with a possible objection to web page markup, specifically that it requires work from webmasters. "Automated data extraction is great when it works, but it can be error prone because different sites can represent the same information in so many different ways," the schema.org website says.
Understanding, in other words, is a harder problem than indexing. Humans may not be obsolete after all.
Attend Enterprise 2.0 Boston to see the latest social business tools and technologies. Register with code CPBJEB03 and save $100 off conference passes or for a free expo pass. It happens June 20-23. Find out more.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Digital Transformation Myths & TruthsTransformation is on every IT organization's to-do list, but effectively transforming IT means a major shift in technology as well as business models and culture. In this IT Trend Report, we examine some of the misconceptions of digital transformation and look at steps you can take to succeed technically and culturally.