Kimball University: Building a Foundation for Smart Applications
Off-the-shelf apps may offer built-in analytics, but the best approach to supporting operational decisions is to rely on a solid data warehouse that cleans, integrates.
Joy Mundy |
Enterprise Applications such as ERP and CRM systems are increasingly available with integrated analytic capabilities aimed at automating and improving operational decision-making. That doesn't mean you can get away without the solid foundation of an enterprise data warehouse and business intelligence (DW/BI) system.
No matter how customizable these applications may be and how extensive the analytic capabilities, your implementation team will still face the problem of integrating and cleaning data from multiple sources. Unless most or all of your transaction systems are part of the same integrated suite, tightly coupled with your smart apps, you'll eventually do much of the work that would otherwise be required to build DW/BI system.
The best way to move beyond information silos and support multiple smart apps is to build an enterprise DW/BI system. You'll gain integrated and conformed information from across the enterprise, granular detail with attributes and hierarchies that can feed any smart application, and rich history and change-tracking capabilities that can help you better predict customer behavior in any business process.
Organizations have two overarching drivers for their data warehouse/business intelligence (DW/BI) system: inform strategic and tactical decision-making, and automate operational decisions. The first goal is classic decision support: executives and analysts use information to run the business. With an excellent data warehouse, executives know what's going on in their business and markets, and can steer a course for success. The second goal can be realized by smart enterprises with enhanced operational systems that make use of data to run more effectively based on informed suggestions or predictions.
The smart enterprise weaves the information platform into the operational application layer. The DW/BI system can add value wherever operational decisions could benefit from a historical context. Examples range from setting order-fulfillment priorities based on customer value and personalizing product suggestions, to improving fraud-detection applications and manufacturing quality monitoring systems.
Decision support is great, but a smart enterprise is even better. The smart enterprise is not new: We've talked about and even implemented such systems for decades. Even the oldest data-warehousing architectural diagrams include a feedback- or closed-loop arrow. What's new is that the smart enterprise is moving into the mainstream, through both packaged analytics and custom applications.
The best way to get to the smart enterprise is to build an enterprise DW/BI system: an information platform that cleans, integrates and conforms data from across your organization. With this solid foundation, the goal of a smart enterprise is much more attainable. We will help you reach the desired end goal by describing the architectural foundation of the smart enterprise.
The Quick-But-Risky Path
Building a data warehouse isn't absolutely necessary for delivering smart applications. The easiest way to get smart applications is to buy them. You can purchase customer relationship management (CRM) software, quality management software and enterprise resource planning (ERP) systems that include BI and smart enterprise features. Why go to the trouble of building an enterprise data warehouse?
Many organizations have a few transaction systems that are either custom-built or significantly customized packaged software, generally when off-the-shelf systems don't meet company needs. This most often occurs at the point where your organization is unique, and often that uniqueness is a key corporate asset. Organizations willing to be on the leading edge may find that packaged smart applications built for a broader market don't do as good a job as a custom application. Kimball Group expects the market for packaged smart apps will evolve to support very flexible customization, but today these systems are, more often than not, black boxes over which you have little control.
Even if a purchased application is perfect, or perfectly customizable, your implementation team faces the problem of pulling data from multiple sources, cleaning and integrating it, and pumping it into the smart application. As you install your second or third packaged smart application, you're repeating a lot of extract, transformation and loading (ETL) functionality; and you're doing it inefficiently and inconsistently. Unless most or all of your transaction systems are part of the same integrated suite, tightly coupled with your smart applications, you'll eventually do a lot of the same work necessary to build an enterprise data warehouse, but won't reap all of the benefits without actually building that warehouse.
The Right Way To Smart Apps
A better long-term approach is to build on an enterprise data warehouse, an information infrastructure that:
• Is focused on the business users' requirements
• Is enterprisewide, containing integrated, conformed information from multiple business processes
• Contains data at the finest granularity possible
• Defines standard attributes, hierarchies and structures that are used across multiple business processes
• Includes attributes, such as customer income levels, that can be used to predict behavior
• Implements change-tracking techniques on key attributes to associate behaviors such as purchases with the attribute's value at the time the behavior occurred
• Identifies data-quality problems and works to fix them at the source or in the ETL process
• Delivers standard reports and analytic applications to the enterprise
• Includes a software, hardware and infrastructure environment that can support all this data, transformation, reporting and analysis at the enterprise level
The enterprise data warehouse directly meets the first goal we described: to support informed decision-making. This is the information infrastructure that supports reports, ad hoc querying, analysis and executive dashboards. Even if you stop here and never implement a smart enterprise application, you've already done a great thing.
However, with this solid information infrastructure in place, you're in a great position to build or buy smart applications. If you're a builder, your analysts will explore the data, build statistical models, evaluate the performance of those models and work with your developers to define the smart features of your enterprise applications. You may implement something simple, like ranking and tagging customers who make a lot of purchases. Or, your analysts may delve into data mining and use sophisticated statistical techniques to score purchase decisions, product-quality measures or Internet-browsing behaviors. Whether your analytic techniques are simple or complex, you'll be working from clean, integrated and consistent data. You can verify that analytic results are real and reproducible.
If your organization leans toward buying rather than building, you'll have enough familiarity with your data to better evaluate the off-the-shelf alternatives. It may be economically feasible to do a proof of concept to evaluate whether candidate products accurately predict the behavior of your customers or processes. Implementation will be an order of magnitude simpler than if you needed to build the complete ETL process to shoehorn the application into your architecture.
When The Infrastructure Falls Short
What if you don't have a data warehouse at all, or you're worried that your infrastructure doesn't measure up to the demands of the smart enterprise?
If you have no data warehouse or enterprise information infrastructure at all, you should follow the Kimball Method to build one. The Kimball Method presents a practical approach to specifying and building your system: Focus on business requirements, build and populate a dimensional model to support a high-value yet solvable set of business processes, and use the Bus Matrix and conformed dimensions to incrementally grow the system to span the enterprise (see "Required Reading" at right). It usually takes six to nine months to go live with the first Kimball Method dimensional model. If you're eager to get going on the smart applications, you may be able to explore the data and options well before the system goes live.
Required Reading: • Two Powerful Ideas: The foundations for modern data warehousing by Ralph Kimball• Divide and Conquer: Build your data warehouse one piece at a time by Ralph Kimball • The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset by J. Mundy and W. Thornthwaite (Wiley, 2006)• The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses by R. Kimball, L. Reeves, M. Ross and W. Thornthwaite (Wiley, 1998) • More than 100 free articles on dimensional modeling topics are available at kimballgroup.com. |
Most organizations have at least some sort of information infrastructure, often isolated data marts built by departments or simple reporting copies of transactional databases. These departmental solutions can be a barrier to your goals because at first glance they may appear to be--and are often called--data warehouses. Their key shortcoming is that they're stovepipes built without an enterprise focus, and it's difficult or impossible to combine the data with information outside their narrow subject area. To avoid this pitfall, you will need to build an enterprise data warehouse. Either start from scratch or choose the best available infrastructure and incrementally grow it into an enterprise-focused resource.
Some organizations have built an enterprise information infrastructure, but with flaws that make it difficult to evaluate, build or implement smart enterprise applications. Hopefully, your enterprise data warehouse has a decent infrastructure and has already tackled some of the most challenging data integration issues. It's possible to add a dimensional data warehouse downstream of an existing data warehouse, and quickly roll out a solid, flexible, easy-to-use system.
The key flaw Kimball Group typically sees in existing enterprise data warehouses is a lack of historically accurate attributes. Almost all data warehouses track transaction history--what was sold, to whom and for how much--but many do a poor job of associating the transaction with the attributes as they were at the time of the transaction. This attribute history is very important for data mining and other smart enterprise applications. Suppose, for example, that you want to build an application that will present product choices to a potential customer based on where he lives. If the data warehouse only keeps the customer's current address, then for the purposes of queries, analysis and prediction, it looks as if the customer has always lived where he does now. We lose the vitally important information about how the customer's behavior changed when he moved from a cold climate to a warm climate. And once that history is gone from the data warehouse, it's difficult or impossible to reconstruct it.
Many enterprise data warehouses are backward-looking to the transaction systems, rather than forward-looking to the business users' requirements. As a result, they generally do an adequate job of serving up predefined reports but are often unwieldy for ad hoc use. The enterprise information infrastructure must be designed to support easy and consistent ad hoc analysis. This is especially true if you plan to build your own smart enterprise applications or customize a packaged system. If a statistician or business analyst can't explore the data at will, one of two things will happen; either the project will be abandoned or the analyst will pull off a chunk of data into a personal mart, which presents all sorts of challenges when it's time to develop and deploy the application.
Supporting Transactional Workloads
Let's say that you've designed the warehouse databases to support structured reports, as well as ad hoc query and analysis. Let's also assume that during the process of designing the smart enterprise apps, a statistician or analyst has made heavy use of the information infrastructure. Don't be surprised to see the model builder issuing queries that strain system resources. In production, however, most smart apps put a small load on the system. All the work has been done in advance; at transaction time all that's required is a simple query and some modest computation. This is good news, because it means that the same database used to support reports and analysis can serve up the information needed for the ongoing operation of the smart enterprise application.
However, just because the database can handle the load doesn't mean it's a good idea. This approach brings us back to one of the fundamental reasons we started building data warehouses in the first place: Analytic workloads are not compatible with transaction workloads--even the read-only workloads that are the BI portion of the smart enterprise application.
Smart designers will isolate the data needed to support the smart enterprise application in a managed environment that's not shared with the classic BI workload of reporting and analysis. This downstream database is usually much smaller than the data warehouse because it's focused on a particular problem. Queries are known in advance and can be carefully tuned. This new database is typically subject to a much higher level of service than the data warehouse since it's really part of the transaction system.
Some smart apps require that historical information be combined with very low-latency data. For example, a CRM application might present a call-center screen that combines a customer's historical information with details about today. The majority of data warehouses include information up to yesterday, so how do you present a unified time series that includes both historical and real-time data?
The most common solution is to have the application query both the data warehouse and the transaction system and then combine the information for presentation. This approach works best if the transactional information requires little integration or transformation. A second popular approach is to add the low-latency data directly to the application database. Adding low-latency data to the more static warehouse database should be a last resort because it significantly complicates the architecture.
Spreading Bi Everywhere
The promise of BI is that all decision-makers in an organization, no matter how strategic, tactical or operational, should have the information they need to do their jobs as well as possible. Data warehousing and BI started decades ago by focusing on the business analyst who advised senior management. We've done a pretty good job of reaching up to the executives and down to middle management by providing BI portals with rich reports and dashboards. The promise for the future is to reach throughout the enterprise, to make our operational systems use the information we've warehoused in an active and automated way.
Joy Mundy is a member of The Kimball Group. She's focused on DW/BI consulting and education for more than a decade. Write to her at [email protected].
About the Author
You May Also Like