Start Making Sense: Get From Data To Semantic Integration

Current integration methods including ETL, EAI and EII can't deliver on the goal of creating a single view of the truth. In fact, composite apps and service-based architectures will only underscore the limitations of current practices. Semantic integration promises a richer understanding of metadata and clearer context, but to embrace this emerging approach, we'll have to revise some cherished data warehousing concepts.


Dave McComb President, Semantic Arts

Is the Semantic Web driving semantic modeling and integration? They go together, but they're independent. Semantic modeling is just a different way of doing conceptual modeling. The graphs of meaning Michael Hammer described in the late 1980s are similar to what we see in the W3C languages for defining ontology: Web Ontology Language (known as "OWL") and RDF. Ontology focuses on concepts, rather than defining terms; it describes why something belongs — or doesn't belong — in a particular concept and how that concept links to other concepts.

Semantic Web standards are going to be important. We are where XML was about four years ago. Today, people just assume XML, as they eventually will OWL and RDF.

What do you see as the big shift ahead for people schooled in relational databases? Do you know the difference between splitters and lumpers? Database people have historically been splitters; every time you put together a new requirement, they like to create a new attribute or entity. That's not necessarily bad, but organizations end up with schemas that contain thousands of these distinctions, and usually only one or two people know all the subtle connections between them. When something new comes along, lumpers want to put them into some larger category that they already know about. Lumpers prefer to start by understanding what things have in common and add distinctions later on. Semantics people are lumpers.

Is there a business problem today that seems to call out for semantic understanding? Compliance with the Sarbanes-Oxley Act. You have to report material transactions: Unfortunately, you haven't coded what a "material transaction" is in your hundreds of applications, and let's say you've only got 48 hours to do it. You're going to need an alternative way of mapping how to find them.

This last point is critical because BI adoption remains dismal: Estimates range from 35% (Gartner, 2005) to less than 10% (my survey, reported in Intelligent Enterprise, 2004) of organizations. Suppose analysts want to understand a sudden deterioration in on-time deliveries to distributors. As regular viewers of reports through BI portals, they're in touch with a wealth of information about shipments, routes, customers, suppliers, contractors, loads, trips and exceptions. Unfortunately, those users have no idea how to use these resources to get answers. Generally, there are two alternatives. The first is to open a spreadsheet, download some of the reports and start cutting and pasting, which may or may not provide the data or calculations to solve the problem. The second is to find the resident "go-to" superuser who knows where the data is, what it means and how to manipulate it.

Many organizations classify information users as power users, manipulators, report viewers and so on. Most organizations are stuck with a pyramidal user hierarchy. The valuable go-to users at the top are in short supply, creating a drag on BI's potential. Everyone else struggles to know what the data means. If users can't make credible inferences about the data, they won't leverage its value beyond the initial presentation. That's the true power of the go-to users: They can draw the inferences.

Semantic Integration

What if we had a different kind of metadata that would let more users go further on their own with BI and productive data analysis? Semantic integration technology, also known as "ontology," is the next step in rationalizing information integration in and beyond organizations. Semantic integration is the application of Semantic Web concepts developed by Tim Berners-Lee, director of the World Wide Web Consortium (W3C). Berners-Lee and the W3C define the Semantic Web as a framework that "allows data to be shared and reused across application, enterprise and community boundaries." Semantic integration raises the level of abstraction so people and systems can focus on meaning and relationships.

In BI and data warehousing, experience with the relational model, relational databases and SQL have established the primacy of set theory, which is essentially about joining, intersecting and constraining elements or groups of things. Unfortunately, relational implementations aren't good at capturing and manipulating the infinitely intricate relationships between things.

That's where semantic integration (or ontology) takes off. Ontological graphs or maps depict the relationships between things, not just their definitions. The standard description method for ontology, Resource Description Framework (RDF), plays a key role in the Semantic Web. RDF is W3C's formal data model, which is used as a framework by developers to share metadata in support of search engines and other tools. Semantic integration uses RDF to create a potentially powerful mechanism for understanding not only the names and definitions of things — the stuff relational metadata gives us — but also how they fit together.

As opposed to many other model types, ontologies are useful quickly, even before they're complete. With semantic models, updating can be continuous, easy and fluid. In sum, here are the primary objectives of semantic integration:

  • Provide metadata with richer meaning
  • Describe the relationships between things, not just their definitions and attributes
  • Provide more abstraction from lower levels of data
  • Let machines draw inferences directly from semantics.

The legal field, frequently overwhelmed by loads of textual and other disparate information, is a fertile ground for semantic integration. The Florida Supreme Court uses the Judicial Inquiry System, a Semantic Web-based technology from Metatomix that brings search-oriented information and querying into a single dashboard. Judges and other types of users statewide find relevant criminal history files and "hot" files, such as warrants, that used to take hours and much assistance. User profiles narrow searches and maintain source security. While this isn't BI and data warehousing, it shows how search can employ semantics to satisfy information seekers.

Financial services firms use ontologies to develop risk-management tools for fund managers and Web self-service applications for customers. The Field Report on page 30 details how Avnet uses semantic integration to field multiformatted requests for information from its enormous electronics products catalog. As ontology tools and platforms mature, we'll see semantic approaches replace relational data-based methodologies that have produced disappointing results.

Abstraction and Inference

A rich semantic ontology can serve as the abstraction layer, not just between BI tools and the data warehouse, but between everything. For an enterprise, semantic integration can become the orchestration point, resource broker and domain adviser. Physical location and data structures become irrelevant; data stewards are free to tune and modify their resources as they see fit. Operational and analytic processes are freed of the burden of tracking data.

With composite application development based on service-oriented architecture (SOA) on the rise, consider one last critical reason to do something about semantic integration. The composite idea is to assemble and reassemble, in something close to real time, applications based on business needs from standards-based services. In other words, we'll be relying on machines to make inferences about information because there won't be time to program everything.

Analytics are an integral part of composite applications. Automated machine processes won't have time to check with the go-to superuser to find out what to do next. Current metadata approaches, which can only describe data, will be inadequate.

SOA is often cited as the next big thing for BI and data warehousing. However, SOA isn't an off-the-shelf answer to structured information integration problems, much less the content-integration headaches Bruce Silver discusses in the companion article. SOA and Web services focus mostly on interoperability between application interfaces and protocols, not data meaning, integrity and transformation. That should be the concern, and focus, of BI and data warehousing professionals.

The Truth About the Truth

Ontology projects leave you with the understanding that truth is relative and fleeting, and that well-formulated contexts can be powerful without being perfectly clear. Obviously, for regulatory reporting, launching a Mars probe or making a soufflé, precision is required. But rapid decision-making with incomplete and imperfect information is the hallmark of intellect. Any fool can make decisions with all the information in front of him — and many do.

Semantic integration benefits BI and data warehousing by moving information integration to a new level, where intelligence can more easily and swiftly proliferate throughout an organization. Productivity will increase because machines will make everyone less reliant on a small number of go-to superusers. And with SOA on the rise, semantic integration will help us get beyond the rigid, often brittle schemas and thin metadata models that characterize multitiered data warehousing today.

Physical implementation decisions always belong with the technologists. However, those most familiar with business objectives, models and processes will ultimately control information resources. BI and data warehousing professionals can't afford to stay back. Semantics are the future.

Neil Raden is the founder and president of consulting firm Hired Brains. He is an author, analyst, consultant and implementer of decision-support environments. Write to him at [email protected].