7 Common Biases That Skew Big Data Results - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
7/9/2015
08:06 AM
Lisa Morgan
Lisa Morgan
Slideshows
Connect Directly
Twitter
RSS
E-Mail

7 Common Biases That Skew Big Data Results

Flawed data analysis leads to faulty conclusions and bad business outcomes. Beware of these seven types of bias that commonly challenge organizations' ability to make smart decisions.
6 of 8

Overfitting And Underfitting 
Overfitting involves an overly complex model that includes noise. Underfitting is the opposite; the model is overly simple. Either skews results.
'[Overfitting] is one of the most common (and worrisome) biases. It comes about from checking lots of different hypotheses in data. If each hypothesis you check has, say, a 1 in 20 chance of being a false positive, then if you check 20 different hypotheses, you're very likely to have a false positive occur at least once,' Greenberg said.
He tested the effect that various behavioral interventions had on site participants. At first, it appeared that a particular behavior outperformed the control. However, when a correction was applied that adjusted for the number of hypotheses tested, the statistical significance vanished.
'If you're using a high-dimensionality, non-linear predictive algorithm that has lots of degrees of freedom that allow you to sift data to the tee, you can essentially take any function and map it point-by-point so that the model does a tremendous job at looking at the data that you fitted on it. It's excellent on that, but if you go beyond that realm, it does awfully in terms of predicting data points that are outside the spectrum you looked up,' said CenturyLink's Schleicher. 'We inevitably split up our data sets into training data sets and testing data sets, and do cross validation across such multiple sets to make sure we don't overfit.'
(Image: PeteLinforth via Pixabay)

Overfitting And Underfitting

Overfitting involves an overly complex model that includes noise. Underfitting is the opposite; the model is overly simple. Either skews results.

"[Overfitting] is one of the most common (and worrisome) biases. It comes about from checking lots of different hypotheses in data. If each hypothesis you check has, say, a 1 in 20 chance of being a false positive, then if you check 20 different hypotheses, you're very likely to have a false positive occur at least once," Greenberg said.

He tested the effect that various behavioral interventions had on site participants. At first, it appeared that a particular behavior outperformed the control. However, when a correction was applied that adjusted for the number of hypotheses tested, the statistical significance vanished.

"If you're using a high-dimensionality, non-linear predictive algorithm that has lots of degrees of freedom that allow you to sift data to the tee, you can essentially take any function and map it point-by-point so that the model does a tremendous job at looking at the data that you fitted on it. It's excellent on that, but if you go beyond that realm, it does awfully in terms of predicting data points that are outside the spectrum you looked up," said CenturyLink's Schleicher. "We inevitably split up our data sets into training data sets and testing data sets, and do cross validation across such multiple sets to make sure we don't overfit."

(Image: PeteLinforth via Pixabay)

6 of 8
Comment  | 
Print  | 
Comments
Newest First  |  Oldest First  |  Threaded View
LisaMorgan
50%
50%
LisaMorgan,
User Rank: Moderator
7/25/2015 | 11:23:02 AM
Re: Selection Bias
I thought about including that but it is a subtype of confirmation bias.  Also, the cognitive biases are far more familiar to the general population than the others.

Selection bias is an issue as you say.
shamika
50%
50%
shamika,
User Rank: Ninja
7/24/2015 | 11:17:51 PM
Selection Bias
Well in my opinion I feel selection bias is needed.  This is very much important when it comes to population rather than a sample. However there is a concern on accuracy.
LisaMorgan
100%
0%
LisaMorgan,
User Rank: Moderator
7/19/2015 | 1:27:05 PM
Re: Confirmation bias is the big one
When I've written for business audiences only, I've avoided the term, "confirmation bias" and instead endeavored them get them to understand the differences between assumptions and hypotheses.  If you assume, you've baked in what you believe to be truth without proof.  A hypothesis is tested - proven or disproven.  In other words, be prepared to be wrong, embrace that and learn from it.  However, is it very common to cherry pick questions, engineer survey questions, and dismiss anything that does not go along with that which one set out to prove.  

Confirmation bias isn't always intentional; however.  It's sometimes done unconsciously.  And that's what people who are genuinely concerned about confirmation bias fear in their own work.  Working collaboratively with others, exchanging ideas and comparing results in an open environment is good, ideally if everyone is not out to prove the same point such as Brand X cola with a zillion grams of sugar is a viable form of health food.  :-)

 
jries921
50%
50%
jries921,
User Rank: Ninja
7/19/2015 | 1:15:47 AM
Re: Confirmation bias is the big one
As Mark Twain pointed out long ago, we all have axes to grind.  Given such is the case, the wise/honest thing to do is to recognize one's own biases and try to correct for them.  Free and open discussion tends to promote this end, which is why those most wedded to their own ideas or the orthodoxies of their own ingroups will often try to squelch it (unless, of course, they're committed to the concept), or simply withdraw to their own little comfortable groups (caucuses?), so they're not subject to the discomfort of cognitive dissonance.  And if partisan or other factional politics start playing a prominent role, it can be very difficult to reach any sort of reasonable consensus (witness what has happened to macroeconomics since the 1980s; or constitutional law since forever).

Another issue is that those most wedded to their ideas are often the very people most motivated to see where they lead.  This can actually be a good thing as the necessary research will often be very hard work the "impartial inquirer after Truth" (unless he is particularly dilligent) might well seek to avoid.  Would Charles Darwin have chosen evolutionary biology as his field of research if his grandfather had not been an early proponent of the concept?  I suspect not.  If such people can resist the urge to cheat, or perhaps can be persuaded to collaborate with those much more skeptical so they can "keep each other honest", then they can do very good work indeed.

it is natural for scientific journals and their readers to be much more interested in successes than failures (I don't expect that to ever change), and pride makes it difficult to let go of one's life's work when it reaches a dead end (perhaps this is part of why so many breakthroughs are made by younger researchers); but even experimental failures will often lead to discovery if one is willing to consider the implications (thus, failure can lead to success).  A classic example was the "failed" 1887 Michelson–Morley experiment that was a large part of the experimental basis for Einstein's Special Theory of Relativity.
LisaMorgan
100%
0%
LisaMorgan,
User Rank: Moderator
7/18/2015 | 7:14:16 PM
Re: Confirmation bias is the big one
Confirmation bias is everywhere.  Sometime it's deliberate, sometimes it isn't.  A data scientists and a machine learning specialist have been trying to get me to talk about the issue in terms of  scientific journals because scientific journals only publish positive results, they say.  When it comes to healthcare, confirmation bias can be very dangerous.  
jries921
50%
50%
jries921,
User Rank: Ninja
7/18/2015 | 3:50:07 PM
Confirmation bias is the big one
I've long thought that the most effective propaganda is that which tells people what they already think; which is why partisan talk radio excels at radicalizing those who already believe, but does little or nothing to persuade people who don't (if anything, the "shock jock" commentary turns them off).  Confirmation bias is also a very good reason to be skeptical of any research sponsored by a for-profit corporation; as even if the researchers and sponsors are working in good faith (a dubious proposition in an era in which profit maximization is widely thought to trump all other considerations), vendors will tend to disbelieve any research that makes their products look bad (few people want to believe they're selling junk).

 
Commentary
Enterprise Guide to Edge Computing
Cathleen Gagne, Managing Editor, InformationWeek,  10/15/2019
News
Rethinking IT: Tech Investments that Drive Business Growth
Jessica Davis, Senior Editor, Enterprise Apps,  10/3/2019
Slideshows
IT Careers: 12 Job Skills in Demand for 2020
Cynthia Harvey, Freelance Journalist, InformationWeek,  10/1/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll