Q&A: A Practical Path to Real-Time Data Warehousing

Teradata's Stephen Brobst and GoldenGate's Alok Pareek describe an evolutionary path to real-time data warehousing and operational BI. It's an incremental journey that starts with a simple question about business process change.
How do you make "real-time" decisioning a reality? With data volumes growing and batch processing windows getting ever smaller, conventional data integration methods often can't keep up. Stephen Brobst, chief technology officer at Teradata, and Alok Pareek, vice president of technology at GoldenGate Software, make the case for change data capture (CDC) technology, process change and, most importantly, visionary business leadership to take advantage of real-time information.

An IDC survey recently found that more than 70 percent of respondents say they expect to use real-time data in their BI environments within the next 12 months, yet only 30 percent say they are doing so today. How do you explain that gap?

Stephen Brobst Stephen Brobst
Stephen Brobst: It really depends on the industry. Big retail in North America is already there, but in other industries there's less competition and less motivation to be innovative. Adoption is very much based on where there's competition. There are also plenty of companies that are moving toward using real-time information, but the hard part isn't the technology so much as the business process change.

Alok Pareek Alok Pareek
Alok Pareek: In addition to competitive pressures, the volumes of data that companies are now dealing with are forcing them to rethink traditional batch processing. Many systems have been running in nightly batch processing mode for decades, but companies are now facing pressure to reduce data latency. The trouble is that if you try to stick with conventional extraction and feed your warehouse on an intra-day basis, it can have a big [negative] impact on your production systems. If you take a change-data-capture (CDC) approach, on the other hand, and you're doing data acquisition on a continuous basis, you can reduce the overall impact on your systems while reducing data latency.

What kind of "impacts" are you talking about and why is CDC faster?

Brobst: CDC uses log sniffing to retrieve only the data that has changed, rather than scanning and replacing big fat tables with lots of data. When you look at data stores getting bigger and nightly batch windows getting smaller, the equation for conventional data extracts just isn't working any more, in many cases, because of the impact it has on mission-critical OLTP systems. Batch extracts are resource intensive in terms of CPU and Input/Output cycles, and there is now precious little time to reach in, grab the data and do all that processing. With CDC, you're doing log sniffing much more efficiently to get only the data that has changed. You don't have to scan all the data, so the CPU and I/O impact is much smaller.

You make CDC sound like the magic bullet for moving to real-time information.

Brobst: Well, I wouldn't call it a magic bullet, because the hard part, as I say, is the business process change not the technology. If you were to deliver real-time data to decision-makers in many organizations, they wouldn't know what to do with it. You have to start by asking, "What would you do with the information if you could see it in real time?" You need a visionary who can step out of the box and change the way you use information for decision-making.

If you consider, for example, they use low-latency information to make decisions about key-word-search advertising. In years past, people would buy key words and invest in advertising days, weeks or even months in advance. Overstock now uses real-time data to decide, based on the inventory that they need to move, which key words and sites are performing better on an hour-by-hour basis. That required a radical change in the business processes for the people in charge of spending advertising dollars.

It sounds easy to say "you need to change the business processes," but how do you initiate that kind of change?

Brobst: It's a transition. Conventional data warehousing is all about strategic business intelligence, which usually doesn't require real-time data. When you make the transition to operational decision-making, you're using information to drive what you're doing in the here and now. I've seen cases where organizations failed because they think "real time" is about delivering the same old reports within a dashboard that's continuously updated. I call that "reports that twinkle," and they are not very useful.

Tactical, operational decisions are made by a different set of people, so you've got to drive information to the front lines of the organization. These are people who are scheduling trucks in a break-bulk shipment or making decisions at an airline service counter or gate when there's a misconnected flight. Real-time information changes their job process.

If you look at Continental Airlines, for example, it now provides real-time information to the directors of flight operations at each hub. They are making decisions about late flights, gate swapping, flight holds and individual passenger accommodations with very detailed, up-to-date information. They're trying to accommodate people right now with information that has arrived within the last 15 minutes. That brought a big change for the flight directors who gained access to that information.

Pareek: Another key point is that the need for strategic decision-making doesn't go away. It comes back to a technical challenge in that not only do you have to change your business processes, you have to use the same data store to satisfy both kinds of decisions. If you switch to getting data through CDC, you'll also have to figure out how to satisfy the conventional reporting needs.

In the case of, they're supporting both strategic and operational decisions from the same data store. They use real-time CDC to bring the data into staging tables, and they run a set of [real-time] applications on top of those tables. They use another technology to lift the data from those tables and move it into the long-term analysis store.

Can you give an example of the difference between "reports that twinkle" and ones that present actionable information?

Brobst: Think about the difference between getting a report today on how many dollars you lost on fraud yesterday versus doing real-time CDC, doing the analysis, detecting suspect activity and preventing the fraud before it happens. The idea of what we call active data warehousing is to do more than report; it's to intervene proactively to influence the behaviors and outcomes. It's happening in virtually any industry in which you can take advantage of the difference between reporting retroactively versus proactively intervening.

So is it more often about changing the nature of the report or introducing access to information where it didn't previously exist?

Brobst: In some cases the change may not even require a human interface. In fraud detection, for example, you might introduce software event detectives that are continually analyzing the patterns in real-time data. You could have humans doing that, but it's much more effective to have software models looking at the patterns, scoring the data and detecting the accounts that are likely to be experiencing fraud. There's no graphical interface at all; it's a coordination of processes. If the software detects a fraud pattern, you propagate that knowledge to an accounting system that puts a block on the account or triggers an outbound call to the customer to inquire about recent activity.

There are cases, like at Continental Airlines, where the company comes up with a new dashboard presenting real-time information. Continental put a dashboard on the desktops of the flight directors at the hubs. Any time a flight is more than 15 minutes late, it pops onto "red zone," and the directors can drill through to see how many customers are on that plane, how many are connecting versus terminating and how many are high-value customers. You've talked about process automation and human-facing dashboards. Which approach will dominate in the real-time era?

Brobst: There's a case for both. But in the long term, the software event detectives will play a big role because of the volumes of data and complexity of the decision-making. The more you can automate, the better, in terms of efficiency and consistency.

You used the example of fraud detection, but that seems like it has been around quite awhile. What are some of the emerging examples of automated event detection?

Pareek: Detection is used in telecommunications to look out for network overloads. When they're spotted, events are triggered to reconfigure the network and balance the load. Energy companies are another example. Several leading utilities are building smart grids that detect fluctuations in supply and demand and make decisions on rerouting energy sources on the fly. In manufacturing, we've seen several cases where real-time detection of defects has been integrated with factory automation software. If you can correct a problem in real time rather than reacting half an hour or an hour after the problem crops up, it can significantly improve the productivity of the entire assembly line.

Brobst: Freescale is a great example of that. They are a spinoff of Motorola, and they do real-time analytics around quality control. They're not just looking for failures; they are spotting the quality trends before they have to shut down the assembly line and start throwing away silicon.

Another opportunity for real-time information is related to customer events. I'm talking about event detectives that look for customer defections or leads that can trigger corresponding offers. For example, Travelocity monitors continuous, real-time pricing feeds from external providers such as airlines, hotels and rental car agencies. As new terms become available, Travelocity can bundle deals that can be presented to visitors to the Web site or promoted via outbound e-mail. These prices are offered to everybody, so the faster Travelocity can react and get new offers out, the more likely it is to get the business of those customers.

What's your advice to companies that are just starting to look for real-time opportunities?

Brobst: I would encourage them to identify and partner with the business visionary who is going to lead the charge. I would go through the "business discovery" exercise of getting the key leaders in a room and asking the simple question, "If you could have the data today that you're used to seeing tomorrow, what would you do with it?" What business process changes would be required and what would the value be? If it's just a twinkling report, it's useless. But if you get the right creative people in the room, you can identify some interesting opportunities.

Is this a big-budget, long-term investment? For instance, many aging data warehouses may not be up to the rigors of real-time data, so is this a rip-and-replace proposition?

Brobst: I'd encourage evolution not revolution. You don't have to make everything real-time all at once. You can pick particular areas of data and phase in real-time information and decisioning. That said, it does assume certain fundamentals in terms of warehouse design. For example, trying to use a data warehouse that is highly de-normalized and highly star schema-oriented is more difficult than using a warehouse that has a more extensible, relational design. So you do have to have some fundamentals in place, but it should be an incremental, subject-area-by-subject-area initiative with relatively short project timeframes of 90 to 100 days. You definitely don't want a multi-year, blow-it-up-and-start-over-again approach.

Editor's Choice
James M. Connolly, Contributing Editor and Writer
Carrie Pallardy, Contributing Reporter
Shane Snider, Senior Writer, InformationWeek
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing
Brandon Taylor, Digital Editorial Program Manager
Jessica Davis, Senior Editor
John Edwards, Technology Journalist & Author