Information Delivery
To achieve a RTDW, you will need to extend traditional data warehouse database designs to accommodate continuous data trickle feeds intermixed with live user queries. The majority of the changes are around schema design, active partition management, and data aggregation. The best course of action is to develop an architecture that can handle the storage of real-time data together with the static historical data on a case-by-case basis. There are a number of possibilities. You could choose to store real-time data together with the static data in the same fact table or the system could store it in a separate table. Your decision depends on the querying and alerting needs of the downstream analytic applications.
The RTDW system could store trickle-feed data in a separate partition in the data warehouse, which is typically called an "active," or delta, partition. While updates are occurring, this active partition is either offline or not visible to all user queries. At a certain predefined interval, typically once every few minutes, the system renames the active partition so it may be merged with the data warehouse tables.
Once merged, queries can work against the new data. This process is often called "flipping" the real-time data. The active partition may be physically consumed in the process of data flipping; if this is the case, the system will open a new active partition to accommodate the next load of continuous updates. The exact technique depends on the capabilities available in the data warehouse database.
Active partitions enable queries to work with constant data snapshots. As the data keeps changing in the data warehouse fact tables, the system must also update any aggregates or summary tables that have been created for performance reasons. The strategy shown in Figure 3 offers incremental aggregation.
If necessary, the RTDW system can intentionally delay partition flipping until an end-of-day batch load. With this strategy, alerts, activity monitoring, and selected real-time reporting can work against the active partition, while most of the other user queries continue to work against the static partition in the data warehouse.
Figure 3 Real-time warehouse repository.
RTDWs extend traditional warehouse querying and reporting services and enable a completely new breed of downstream applications that are developed to provide online, real-time decision support, activity monitoring, and alerting. Making such applications succeed requires tight integration of the RTDW with existing information delivery channels in an organization, such as enterprise applications, portals, and multichannel devices. Middleware deployed must support all delivery mechanisms. Alas, many querying and reporting products that require bulk offline data transfer from the warehouse to the product's engine for processing are not suitable in a real-time environment.
Generally, existing enterprise applications can work with the RTDW's information. Such applications are often deployed to a vast number of users, and cover many customer, supplier, and employee touchpoints. Therefore, enterprise applications present an excellent opportunity for a high-impact, ROI-delivering RTDW. A CRM application, for example, could benefit from real-time reports about a customer's propensity to buy a product while he or she is on the phone or visiting a company Web site.
RTDWs could also aid in performance management: for example, where an employee wants to set key performance indicators (KPIs) on his or her portal home page to manage by objective. The RTDW could refresh the KPI data periodically and alert the employee (potentially through a pager or other mobile device) when certain critical conditions exist. Such alerts could be emailed to the employee for less critical situations. In this way, the RTDW enables continuous information dissemination from the data warehouse to a broader user community.
Timely Evolution
Given current business requirements, a RTDW is a natural evolution from the traditional EDW. Extending the traditional data warehouse with real-time data will help organizations meet the need to reduce information latency, a key factor for successful real-time enterprises.
Promising better integration with enterprise business applications, RTDW can help organizations meet more aggressive service levels and deliver business intelligence to a wider community of knowledge workers, who depend on information to execute business strategies. Finally, by tapping into existing systems effectively, the RTDW will allow many organizations to maximize the potential of their data investments.
Rajesh Gadodia [[email protected]] is a senior solutions manager with Oracle Corp.'s BI & DW Asia Pacific Division.