It's Not Just Semantics

The Worldwide Web Consortium's Eric Miller says the Semantic Web is the enterprise's best hope for sustainable data interoperability.

How can Semantic Web technology benefit businesses when even centrally controlled data has quality, integration and metadata problems?

Semantic Web technologies are primarily a powerful means for supporting data integration and reducing the costs of data reuse, freeing the information from the applications that create it. As a side benefit, these technologies can allow more people to work on quality and maintenance, continuously and collectively — in turn reducing the costs of quality while improving it.

How do Semantic Web technologies enable that?

The Resource Description Framework (RDF) is the underlying unified data model for representing semantics. The RDF model is similar to an entity-relationship (E-R) model: resources (entities) have relationships with other resources on the Web. One important difference between RDF and E-R is that RDF lets you define relationships in the context of other relationships. For example, if there's a "title" term in one database and "document title" in another, I can add a new assertion to the Web that says they mean the same thing. So when people search for one, they can easily search on the other at the same time. An XML representation of this assertion can be written to the Web so others can exploit it.

RDF at the very basic level defines a simple, powerful, flexible data model. RDF Schema builds on this model, enabling individuals or communities to declare descriptive terms such as "cost" along with classes of objects such as "person" or "car" for any particular domain or application.

Do people need to organize and manage the terms they use?

OWL, the World Wide Web Consortium's Web Ontology Language, provides a descriptive means of relating and constraining terms. For example, "automobile" is the same as "car," "author" is a relationship that exists only between a "person" and a "document," and a "person" can be male or female but not both, and so on.

RDF, RDF Schema and OWL are also designed to provide incrementally more powerful inferencing capabilities — recognizing implied relationships that exist among data and making these explicit for others to use.

The cumulative effect — the network effect — of all these assertions (machine inferenced or provided by people) being recorded back into the Web is powerful. Each new assertion adds value to the Web for everyone.

Isn't there a danger incorrect assertions will make the Web unreliable for business?

This is a good point, and it gets back to the issue of governance. There's a strong benefit in leveraging these open solutions in a closed environment. Inside an organization, instead of just anyone being able to make these relationship assertions, there could be a "semantic integration" department responsible for them.

But the pattern I'm seeing right now is that businesses are combining the two approaches: distributed and centralized. Anyone inside the company can begin to stitch data together by adding assertions back into the network. The centralized group in charge of the mappings is facilitated by the fact that all these people are tying these things together. Meanwhile, the centralized group runs off-the-shelf bots that harvest these assertions from the intranet and evaluate their accuracy.

Book recommendation: The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary by Simon Winchester.

What's great about jazz: The best artists make a bit of order out of chaos.

Editor's Choice
Richard Pallardy, Freelance Writer
Salvatore Salamone, Managing Editor, Network Computing
Kathleen O’Reilly, Leader, Accenture Strategy
Cassandra Mooshian, Senior Analyst, AI & Intelligent Automation, Omdia