Software // Information Management
07:36 PM

The Architecture of Enterprise Data Quality

A finger in the dam won't stop the flood of operational data coming your way, but in-database mining will help you adapt to the flow.

The intent of business intelligence (BI) is to help decision makers make well-informed choices. Therefore, modern BI systems must be able to consume vast quantities of detailed, disparate data and quickly reduce it to meaningful, accurate information that people can then confidently act on. The corollary of data quality is better decision-making.

An immediate challenge for data architects, however, is the ever-rising flood of operational data that must be cleansed, integrated, and transformed. These tasks must address enterprise-scale issues, including the ever-increasing volume and variety of data and, in some cases, near real-time data refresh levels.

Up to the Job?

Most traditional data quality tools, whether they're implemented as stand-alone solutions or used to supplement extract, transform, and load (ETL) processing, are inadequate for enterprise-scale implementations. Most of them simply can't scale to enterprise-level data volumes and refresh frequencies (from batch to continuous).

To illustrate, let's consider the data flow dictated by the cleansing and transformation batch processing that's common to these tools. First, the data must be extracted from the source and temporarily stored in files or tables for processing. The data is then cleansed, transformed, or otherwise prepared according to predefined data quality rules. During this process, the data is moved in and out of temporary files or tables as required. When the data is prepared to the defined specifications, it's temporarily stored again. Finally, the data is moved from the final temporary storage and loaded into the target data warehouse tables or passed on to the ETL technology for further processing. When you consider all the batch data movement required by typical data quality software, it's easy to see that the technology can quickly become a process bottleneck as data volumes or refresh rates increase.

The enterprise-scale requirements of data quality, coupled with the limitation of traditional data quality technologies, leave architects with few options. Some architects have merely lowered expectations. They either implement data quality processes for only critical data or constrain quality processing to pedestrian activities, such as simple standardization. And although these approaches may serve as workarounds to handle data volume issues, they operate at the expense of the trustworthiness of the overall warehouse. Poor-quality data compromises virtually all the analytics and, therefore, the data warehouse's value to decision makers.

First Things First

But the seasoned architect is aware that certain techniques and technologies can be adapted to meet the requirements of enterprise data quality, specifically, in-database data mining now offered by leading database vendors. But before I can continue with how to incorporate in-database mining into your data quality solution, it is critical that you:

  • Look beyond the typical applications associated with data mining technology. Data mining is too quickly pigeonholed as an esoteric application used only for prediction or forecasting.
  • Appreciate the value in-database data mining brings to the modern warehouse.

1 of 3
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of July 17, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.