Software // Information Management
Commentary
11/27/2006
03:25 PM
Penny Crosman
Penny Crosman
Commentary
Connect Directly
RSS
E-Mail
50%
50%
Repost This

Applying Semantic Web Ideals

"From the billions of documents that form the World Wide Web and the links that weave them together, computer scientists and a growing collection of start-up companies are finding new ways to mine human intelligence.

The New York Times recently ran a story about the notion of the Semantic Web. You have to be a subscriber to access the article on the paper's site, so here's an excerpt:

"From the billions of documents that form the World Wide Web and the links that weave them together, computer scientists and a growing collection of start-up companies are finding new ways to mine human intelligence."Their goal is to add a layer of meaning on top of the existing Web that would make it less of a catalog and more of a guide -- and even provide the foundation for systems that can reason in a human fashion. That level of artificial intelligence, with machines doing the thinking instead of simply following commands, has eluded researchers for more than half a century.

"Referred to as Web 3.0, the effort is in its infancy, and the very idea has given rise to skeptics who have called it an unobtainable vision. But the underlying technologies are rapidly gaining adherents, at big companies like I.B.M. and Google as well as small ones. Their projects often center on simple, practical uses, from producing vacation recommendations to predicting the next hit song.

"But in the future, more powerful systems could act as personal advisers in areas as diverse as financial planning, with an intelligent system mapping out a retirement plan for a couple, for instance, or educational consulting, with the Web helping a high school student identify the right college."

The article goes on to discuss social networking sites, artificial intelligence, and the ontology and taxonomy efforts of Cycorp and IBM, both working to build a layer of intelligence across the entire web.

Bloggers from all corners of the web critiqued the piece, ridiculing the use of the term "Web 3.0" and expressing skepticism over the existence of Web 2.0. Most articulate, in my opinion, was Nick Bradbury, architect of client solutions at NewsGator, who wrote, "The goals of the Semantic Web are good ones, and I believe many of those goals will be met in my lifetime. But too much of the Semantic Web relies on data being valid - that is, valid XML, XHTML, RDF, etc. - and too many of us will never publish valid data.…If the Semantic Web hopes to exist, it's going to have to deal with invalid HTML, badly-formed XML, and RSS with vague entity escaping. It's also going to have to filter out every new variation of spam, and be smart enough to know when people lie. The Semantic Web may happen, but if it does, it's going to be a helluva lot messier than the architects would like."

These are great points -- data formatting and data quality are serious issues on the web and probably always will be. I wonder if it's even possible to create layers of meaning that would be universally understandable and intelligible to everyone, or to filter out inaccuracies and junk across the web. As Albert Einstein pointed out, "Whoever undertakes to set himself up as a judge of Truth and Knowledge is shipwrecked by the laughter of the gods."

But within companies and websites, a semantic layer can be initiated -- albeit with much time and care -- with the development of ontologies and taxonomies that provide definitions and structure to categories of content. We look at these in detail in our December cover story, "Search in Focus."

Do you have any thoughts on this topic? Please email me at pcrosman@cmp.com."From the billions of documents that form the World Wide Web and the links that weave them together, computer scientists and a growing collection of start-up companies are finding new ways to mine human intelligence.

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.