While you’re pushing ahead with innumerable projects that rely on data in your organization, give attention to the quality of that data. Whether the data is collected in an analytic database for fraud detection or in a data lake for four to five different projects, that data needs to meet a quality standard that makes it fit for purpose.
Data quality is an elusive subject that can defy measurement and yet be critical enough to derail any project. It’s easy to have overly optimistic assumptions about data’s efficacy. Having data quality as a focus is a business philosophy that aligns strategy, business culture, company information, and technology in order to manage data to the benefit of the enterprise. Put simply, it is a component of competitive strategy.
Even so, many are asked to quantify the addition of data quality work and software to projects.
Though many benefits accrue from improving the data quality, many of these benefits are unreasonable to measure. Benefits such as improved speed to solutions, a single version of the truth, improved customer satisfaction, improved morale, an enhanced corporate image, and consistency between systems accumulate, but an organization must selectively choose which benefits to perform further analysis on and convert to hard dollars since ROI must be measured on hard dollars.
Measuring data quality ROI requires a program approach to data quality. Data quality improvement is not just another technology to implement. Investments in the technologies as well as in organizational changes are necessary to reap the full rewards. Data quality is right in the “sweet spot” of modern business objectives that recognize that whatever business a company is in, it is also in the business of data. Those companies with more data, cleaner data, accessible data, and the means to use that data will come out ahead.
Tangible ROI on Data Quality
Abstracting quality into a set of agreed data rules and measuring the occurrences of quality violations provides the returns in data quality ROI. The key steps that you can take and put into action to help you realize tangible ROI on any data quality additions starts with determining the data quality rules.
Data quality can be defined as a lack of intolerable defects. There is a finite set of possibilities that can constitute data quality defects and that categorize all data quality rules, such as data existence, referential integrity, expected uniqueness, expected cardinality, accurate calculations, data within expected bounds and just simply correct data. The rules generated in this step are the rules that you wish your data to conform to. These rules can apply wherever important data resides.
The next step is to determine the data quality with a data profiling and prioritization exercise. Usually no one can articulate how clean or dirty corporate data is. Without this measurement of cleanliness, the effectiveness of activities that are aimed at improving data quality cannot be measured.
Measuring data quality begins with an inventory. By taking account of important data across several tangible factors that can be used to measure data quality, you can begin to translate the vague feelings of dirtiness into something tangible. In so doing, focus can be brought to those actions that can improve important quality elements. Ultimately, data quality improvement is performed against a small subset of data elements, as most elements already conform to standard. The subset must be selected carefully, however. Another way to put it is that data quality initiatives are not comprehensive across all corporate data elements and all possibilities.
Data profiling can then be performed with software, or queries against the data showing the spread of data in the affected columns and checking for rule adherence. Once the rules are identified and the data is profiled, scoring the data quality needs to be performed. Scoring represents the state of the data quality for that rule. System scores are an aggregate of the rule scores for that system and the overall score is a prorated aggregation of the system scores.
Data Quality Scores
ROI is about accumulating all returns and investments from a project’s build, maintenance, and associated business and technology activities through to the ultimate desired results -- all while considering the possible outcomes and their likelihood. Each project is different, but all things being equal, the data quality scores of a system lead to different system results, and hence to different ROIs.
The ROI is arrived at not by intellectually determining how the data should look, but also at the cost to the function of the system if the data lacked quality.
So, it behooves us to improve the quality of the data to improve the anticipated return on the project. But at what cost? In the final step, you detail the data quality actions and apply cost to arrive at the ideal data quality level. You can hold back data that has violations, fix it, report it, fix it at the source, etc.
The willingness to spend on data quality improvements should be entirely determined based on the ability to advance the data quality scores, which have been correlated to project return.
There’s little doubt that placing a value on the data being collected in organizations’ systems is a difficult proposition. Organizations demand tangible returns on the investments they make, and data quality is no exception.
We’re in an exciting time in history. Organizations are beginning to wake up to the fact that the data they collect and manage should be viewed as a corporate asset. Ultimately, the quality of your data can be an advantage or disadvantage to projects.