IBM Talks Data Management

IBM's Pat Selinger discusses autonomic computing, the "Stinger" database, and the difficulties companies hit when they're trying to analyze diverse data sources.
Business Intelligence Pipeline: It seems that many practitioners of business intelligence are struggling to integrate numerous and disparate data sources. Specifically with regard to BI, what major trends are customers telling you about?

Selinger: In particular, I'm seeing business intelligence branching out into the use of either semi-structured or unstructured data. This means things like analyzing customer e-mails and customer service files, along with their payment history and so forth. That stretching of business intelligence to include what has traditionally been considered content management is a very pronounced trend. It's one that I think people are finding they have to do in order to be on the leading edge of competitiveness. I see more customers integrating information from these multiple sources and accessing it. Often this takes place in real time, which is another major trend.

Over the last five years, business intelligence has evolved from "nice to have" to mission-critical. And it's become a part of processing normal business functions. It's got 24-7 availability requirements. It has rapid real-time requirements in certain cases -- things like fraud detection. The faster you can detect fraud, the better off you are. So the kinds of output needed from business intelligence lead people to build their warehouses continuously rather than crunching a week's worth of data every Saturday night.

Business Intelligence Pipeline: What are the recurring pain points that companies hit when they're trying to build analysis from such diverse data sources?

Selinger: One of our customers' consistent interests is metadata. To pull together an information-integrated, single-system view of the data, you really have to have metadata or the ability to collect up the metadata. This means what it looks like, what it means. What is a customer? What is a customer number? What does a quantity mean? All of those kinds of things. When companies had their data in independent silos, everybody knew individually what their customer number meant in this system versus that system. But nobody tried to work together to make them the same. Yet when you want to join those things, you have to understand how these things matter. You have to know, for example, what does "delivered" mean? Does it mean that the product has left the dock, or does it mean that the customer has signed the receipt?

We're really talking about semantic integration. That's really what we have to be able to help our customers do. This is going to take expertise from our customers who know their businesses very well, and at the same time a set of tools and infrastructure and middleware that will help automate that process and determine, "Do these two things look the same or not?"

Business Intelligence Pipeline: What other enhancements will we see in Stinger that relate specifically to BI?

Selinger: We continue to work on the cluster technologies. I see customers experimenting in this direction, using clusters versus large SMPs (symmetric multiprocessors). I see customers who are strongly SMP customers, and they're going to be that way, and that's the right thing for them. I see another set of customers who are putting together the individual nodes into clusters. Linux clusters in particular are a direction they're taking. I see this particularly in the BI area, because it seems to be a natural way to divide up problems and to parallelize the giving of query answers. They're thinking in terms not only of keeping things like transactional data; they want to keep clickstream data or individual entries on dynamic Web pages. We will continue to enhance our integrated Linux cluster solutions for customers who choose them.

Editor's Choice
Samuel Greengard, Contributing Reporter
Cynthia Harvey, Freelance Journalist, InformationWeek
Carrie Pallardy, Contributing Reporter
John Edwards, Technology Journalist & Author
Astrid Gobardhan, Data Privacy Officer, VFS Global
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing