11 Irritating Data Quality Issues
Organizations have been trying to improve data quality for more than 30 years, but it’s still an issue. In fact, it’s more important than ever. Here’s a look at common data quality challenges that can drive you mad.
Data quality isn’t a new concept, though its importance has continued to grow with big data, analytics and AI. Without good data quality, analytics and AI are not reliable.
“The traditional issues around data quality are still the kind of issues we see today,” says Felix Van de Maele, CEO of data intelligence company Collibra. “If you think about completeness, accuracy, consistency, validity, uniqueness, and integrity, they’re very much the same data quality dimensions that companies struggle with to this day.”
So, why are companies still struggling with data quality? Because it’s still not getting the attention it deserves. For one thing, it’s not as sexy as analytics or AI.
“Data quality has never been more critical, particularly in the realm of artificial intelligence and analytics,” says Laura McElhinney, chief data officer at product-focused consultancy MadTech, in an email interview. “[T]he quality of the data directly impacts the accuracy and efficacy of AI outputs, whether we are discussing generative models or traditional machine learning systems. Data quality [also] forms the bedrock of effective analytics, reporting, and business decision-making. Without it, analytical insights are compromised, potentially leading to misguided strategies and decisions based on erroneous information. Therefore, ensuring high data quality is not just a technical requirement but a strategic imperative.”
The following are some of the most common data quality challenges that persist in enterprises.
1. Unstructured data
There’s a lot of talk about data quality as it relates to unstructured data, because there’s so much of it and organizations want to use it for AI. There are concerns about the quality of that data, its currency, redundancy, and the fact that people are cutting and pasting it from one system to another. Meanwhile, personally identifiable information (PII) and sensitive company data may be in places it shouldn’t be according to Jack Berkowitz, chief data officer at data intelligence platform provider Securiti AI. “[O]ne of the things about the data lakes was, well, we’ll just dump it in there. We’ll figure it out later,” says Berkowitz. “Here, you need to you need to have those business cases or those use cases decently defined that you’re going to try to do. Especially seek out and start incrementally getting your unstructured data organized. There's just too much of it to just say, well, we’re going to do everything. So, prioritizing some of these use cases, and just attacking it that way.”
2. Data entry
Humans are the root cause of data quality issues, and there are few better examples than in the healthcare industry in which information recorded on paper is manually entered into systems.
“Doctors’ offices, the physician, the nurses, the billing folks are taking your insurance care and typing things in [when] submitting bills,” says Ryan Leurck, co-founder, chief analytics officer at healthcare technology and data analytics company Kythera Labs. “The data quality in those electronic data systems is focused on the aspects that are the most important for ensuring that a payment happens, for example. They’re not going to mess up the dollars and cents, but there are 80 other fields. You might take for granted that a lot of the data on a claim is accurate, when it might be that no one ever looked at it.”