Why Data Analytics Can't Tell You Everything
Data analytics is a powerful, yet limited tool. Don't allow misleading analytics to lead you into faulty decisions.
Using data analytics is like receiving advice from a generally knowledgeable expert: useful, insightful, but subject to additional study and interpretation.
By far, data analytics’ greatest limitation is the data itself, says Kentaro Toyama, the W. K. Kellogg professor of community information at the University of Michigan School of Information. “You can only draw conclusions from the relevant data that's available, and the best analysis of that data will only be as good as its quality and quantity,” he observes.
Data is often collected and analyzed in isolation. “This has become exacerbated with how easy it is now for departments and teams to spin-up new data analytic environments,” says Nima Negahban, co-founder and CEO of real-time analytic database developer Kinetica. This failing can lead to decisions that aren't well-suited to enterprise interests and may even be counterproductive.“In many cases, decisions need to be made quickly, and data analysts may not have enough time to thoroughly analyze the data and consider all relevant factors,” he notes. “This can lead to a rushed or incomplete analysis, which can in turn lead to suboptimal decision making.”
The lack of a solid governance data taxonomy -- the classification of data into hierarchical groups to create structure -- creates the biggest challenge in data analytics and its applications, says Sisi Zhang, executive vice president of data science and analytics at interactive marketing and technology firm Razorfish.
Taxonomy Is Critical
From a marketing perspective, data taxonomy is what allows marketers to correctly attribute performance across different types of paid and owned channels. “Taxonomy is critical to ensuring that we have the right data inputs to drive relevant insights around marketing performance,” Zhang explains. She notes that she often sees inconsistent applications of taxonomy, as well as low governance. “While we can do some retroactive clean-up of taxonomy, if consistent taxonomy issues persist, they make it almost impossible to derive meaningful insights at scale.”
Clean and consistent data is the key to many data analytics capabilities, including reporting, dashboard visualization, advanced analytics, and data science, Zhang says. “When taxonomy is wrong, what would have been an easy exercise in understanding performance output becomes a manual and labor-intensive exercise to retroactively clean up taxonomy to be able to use the data in some form.”
Often, taxonomy can't be fully cleaned ex post, since there's specific metadata that's meant to be captured when marketing campaigns are live, but are difficult to retroactively fit, Zhang says. “This means that analytics output is then limited to very basic insights, which aren’t very useful for measurement or optimization for marketing initiatives,” she notes.
The insights gained from data analytics are only as rich as the data that goes into training the data model, says Peter Kirkwood, strategy leader at Zinnov, a management consulting and strategy advisory firm. Collected data is raw and full of biases and errors, which requires a significant amount of manual effort to clean and make it usable for training artificial intelligence and machine learning models. “While machine learning is great at analyzing data, the real challenges are the same that have been faced for a generation -- garbage in, garbage out,” he explains.
Good Data Is Hard to Find
All too often, good data is simply unavailable. “For example, a lot of data analysis seeks to predict the future,” Toyama says. “But, of course, we have no data about the future, so we can only rely on past data.” While in some contexts the past can provide a reliable indication of future trends, “some of the worst sins of data analysis have occurred when an analyst assumed the future would be like the past, and it wasn't,” he notes.
Meanwhile, data sets may lack information about why something happened. “Many data sets contain no direct measurements of the variables we most want to know so, at best, we have to make inferences that, depending on the other data available, might or might not be accurate or credible,” Toyama explains.
Excelling at Analytics
Organizations that excel at data analytics overcome the science's limitations by embracing a growth mindset, Kinetica's Negahban says. “These organizations ask, ‘What data are we not capturing that could add another piece to the puzzle?’”
Apart from a solid technical skills foundation, a data scientist's most useful trait is smart skepticism, Toyama says. “Skepticism makes data scientists question the quality of their data,” he notes. “Smarts makes them realize that the data is rarely as good as it should be.”
“Too many data scientists aren't skeptical enough,” Toyama says. They become proud of, and attached to, their own analyses. “But, once you get attached, you're no longer objective.”
What to Read Next:
Top 10 Data Science Tools and Technologies
About the Author
You May Also Like