Common Biases That Skew Analytics - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
Commentary
8/17/2016
01:30 PM
Lisa Morgan
Lisa Morgan
Commentary
Connect Directly
Twitter
RSS
50%
50%

Common Biases That Skew Analytics

Part of any analysis has to be a sanity check to understand where the data may reflect any number of different types of bias.

How do you know if you can trust analytical outcomes? Do you know where the data came from? Is the quality appropriate for the use case? Was the right data used? Have you considered the potential sources and effects of bias?

All of these issues matter, and one of the most insidious of them is bias because the source and effects of the bias aren't always obvious. Sadly, there are more types of bias than I can cover in this blog, but following are a few common ones.

Selection bias

Vendor research studies are a good example of selection bias because several types of bias may be involved.

Think about it: Whom do they survey? Their customers. What are the questions? The questions are crafted and selected based on their ability to prove a point. If the survey reveals a data point or trend that does not advance the company agenda, that data point or trend will likely be removed.

Data can similarly be cherry-picked for an analysis. Different algorithms and different models can be applied to data, so selection bias can happen there. Finally, when the results are presented to business leaders, some information may be supplemented or withheld, depending on the objective.

This type of bias, when intentional, is commonly used to persuade or deceive. Not surprisingly, it can also undermine trust. What's less obvious is that selection bias sometimes occurs unintentionally.

Confirmation bias

A sound analysis starts with a hypothesis, but never mind that. I want the data to prove I'm right.

Let's say I'm convinced that bots are going to replace doctors in the next 10 years. I've gathered lots of research that demonstrates the inefficiencies of doctors and the healthcare system. I have testimonials from several futurists and technology leaders. Not enough? Fine. I'll torture as much data as necessary until I can prove my point.

As you can see, selection bias and confirmation bias go hand-in-hand.

Outliers

Outliers are values that deviate significantly from the norm. When they're included in an analysis, the analysis tends to be skewed.

People who don't understand statistics are probably more likely to include outliers in their analysis because they don't understand their effect. For example, to get an average value, just add up all the values and divide by the sum of the individuals being analyzed (whether that's people, products sold, or whatever). And voila! End of story. Except it isn't…

What if 9 people spent $100 at your store in a year, and the10th spent $10,000? You could say that your average customer spend per year is $1,090. According to simple math, the calculation is correct. However, it would likely be unwise to use that number for financial forecasting purposes.

Outliers aren't "bad" per se, since they are critical for such use cases as cybersecurity and fraud prevention, for example. You just have to be careful about the effect outliers may have on your analysis. If you blindly remove outliers from a dataset without understanding them, you may miss an important indicator or the beginning of an important trend such as an equipment failure or a disease outbreak.

Simpson's Paradox

Simpson's Paradox drives another important point home: validate your analysis. When Simpson's Paradox occurs, trends at one level of aggregation may reverse themselves at different levels of aggregation. Stated another way, datasets may tell one story, but when you combine them, they may tell the opposite story.

A famous example is a lawsuit that was filed against the University of California at Berkeley. At the aggregate level, one could "prove" more men were accepted than women. The reverse proved true in some cases at the departmental level.

 

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
IT Careers: 10 Industries with Job Openings Right Now
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/27/2020
Commentary
How 5G Rollout May Benefit Businesses More than Consumers
Joao-Pierre S. Ruth, Senior Writer,  5/21/2020
News
IT Leadership in Education: Getting Online School Right
Jessica Davis, Senior Editor, Enterprise Apps,  5/20/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Slideshows
Flash Poll