In the beginning, computer systems were only occasionally networked, and when they were, a number of completely different protocols were used to connect one system with another. Then, Tim Berners-Lee said, "Let there be a World Wide Web!" and it was created, and he saw that it was good. After creating one of the biggest revolutionary phenomena in the last two decades, he should have taken a rest at that point. Instead, he created the World Wide Web Consortium (W3C), which is now the main standard body for the Web. This group holds the reins to HTML, HTTP, and URI as well as being the originator of other pearls of the Internet such as XML, XSLT, and SOAP. Again, you would think this would be enough. However, Berners-Lee's next "big thing" is still on the horizon, and he has been working on it since 1998: the Semantic Web.
What is the Semantic Web?
The W3C states on its Web site that the goal of the Semantic Web is "to create a universal medium for the exchange of data." So how exactly is that different from the modern-day Web? It's not the easiest thing to describe. Even Berners-Lee has stated that trying to describe the Semantic Web is as hard as it was to describe the original World Wide Web 15 years ago. But like the World Wide Web, once it is seen and experienced, it becomes a simple but powerful paradigm shift.
The chief ingredient to the Semantic Web is its application of metadata. Metadata, data about data, describes characteristics about some piece of information such as when, who, or why something was created, changed, deleted, or collected. In contrast, the information in almost any Web page today doesn't include this kind of metadata and is unusable or static without the appropriate context. Something has to give data some sort of meaning to be used appropriately. Without metadata attached to every piece of data, the only mechanisms that can ensure its correct interpretation are humans familiar with the data and who know how to manipulate that data or a machine or piece of software written specifically to use the ontology behind the data.
The Semantic Web promises to change that by providing a set of technologies and standards that ease the ability to annotate information and to query those annotations. This not only allows fuller search capabilities, but it also allows programs to intelligently infer the task at hand as well as the result that is actually being looked for. It lets data on the Web be processed intelligently by automated tools regardless of how different the data's end points are.
Using the Semantic Web
To understand the concepts of Semantic Web, it may help to give a more concrete example of how it could be used. For instance, imagine you're browsing the Web for vacation spots and find some travel site that describes a relaxing location, dates and times in which it is open, as well as phone numbers and people to contact to get more information. Currently, you have to manually copy this information into your calendar, address book, and then go to some map site to figure out where this place is and how to get there. Instead, you might prefer to have this all done at the click of a button.
If something is a date, your calendar application should recognize it and immediately do something with it. The same could be said for your address book, or even your GPS unit. But let's take this scenario a step further: With a single click, you also get a weather report or a travel advisory about the vacation area. In fact, with the right tools, you should be able to infer a number of things including what vacations were chosen by other people who viewed this page. Many of these ideas do currently exist in some form or another. Amazon.com can easily tell somebody what other books they might also like. However, this ability is only valid if you're looking at a book title on the Amazon.com site itself. The question is: can you go to any author's Web site and somehow have your browser immediately list other books similar to that author's book?
The simple idea behind the Semantic Web is that it should be effortless to take four or five pieces of information and use it in various tools and environments with little or no need to reformat or specifically state which piece of information is a location, a date, or contact information. Even with Web-based service-oriented architectures, each remote service only knows what it's expecting in simple terms: the order of parameters and whether a parameter is a string, an integer, or a floating-point number. The service doesn't know whether the number is a Fahrenheit or Celsius value, or whether the string is the name of a person or location. It assumes that the requesting author knows enough about the service and its interface so that it sends the correct information to the service and expects a certain type of information from the service. In the perfect world of the Semantic Web, these types of assumptions would never need to be made.
Technologies and Standards
Some of this description may still sound too abstract and maybe even a little too fanciful. However, the concepts behind the Semantic Web are slowly solidifying in a number of new technologies and standards. The first well-established standard is the Resource Description Framework (RDF), which is used to describe metadata and facilitates the ability for heterogeneous systems to communicate using commonly understood terms. If a system uses RDF internally then all the data internal to that system can be easily linked with one or many outside RDF systems. There's no schema negotiation that needs to happen. The data is simply concatenated. RDF comprises a list of declarations in which each states the value for some piece of metadata. From this, you can execute a relational query on this concatenated data stream including the use of constraints, joins, and views. Even though the content of each of these RDF systems may be completely different from each other, each can be integrated and queried intelligently and quickly. In addition, when new RDF systems are introduced, no other actions need to be taken to integrate the system.
The second newly minted standard is the Web Ontology Language (OWL). The W3C asserts that the purpose of OWL is to provide a standard way of creating computer-usable definitions of basic concepts in a domain and the relationships among those concepts. This includes not only encoding the knowledge about a specific domain but also how that knowledge relates across different domains. This, in turn, makes the definitions reusable. In addition, OWL incorporates the linking abilities inherent in RDF so as to be able to allow it to scale, be distributed, be compatible with standards, and be open and extensible.
The combination of RDF and OWL is already providing solutions that are helping many companies realize the ultimate vision of the Semantic Web. The W3C enumerates a number of real-life use cases being implemented using RDF and OWL, including the ability to categorize rules to enhance searching, Web services discovery and composition, content mapping between Web sites, expressing user preferences/interests, and content-based searches for non-text media. Although various uses currently exist for many of these efforts in both corporate and government work as well as enterprise application integration tasks, the usefulness of the Semantic Web has only just begun to be measured.
As the science of knowledge representation improves and the amount of structured collections of information on the Internet increases, the ability to create sets of inference rules grows. This may one day lead to the ultimate ability to conduct automated reasoning and analysis tasks with little or no additional input to a system. It isn't quite the same kind of artificial intelligence that you might see in a science fiction movie, but it definitely provides the starting point for manifestations of different kinds of applied machine intelligence.
From the examples I've described, you can see that there's a lot of hope for the future of the Semantic Web. It already consists of two main standards that are actively being used and getting amazing results. It still has a long way to go to fulfill its ultimate vision, but it doesn't seem that Tim Berners-Lee plans to slow down anytime soon in making it a reality. With the World Wide Web, Berners-Lee created something that provided a massive amount of information, but could never really understand any of the information itself. The creation of the Semantic Web, however, fixes that oversight. While the creation of the World Wide Web made the Internet all-knowing, the creation of the Semantic Web makes the Internet cognizant.
Michael J. Hudson is a software architect for Praxis Engineering Technologies in Annapolis Junction, Md. His current work includes developing enterprise architectural solutions for both commercial and government clients.
Resource Description Framework: www.w3.org/RDF
Web Ontology Language: www.w3.org/2001/sw/WebOnt
Additional Columns at IntelligentEnterprise.com
"The Semantic Web," March 28, 2002: www.intelligententerprise.com/020328/506decision1_1.shtml