7 Common Biases That Skew Big Data Results - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
7/9/2015
08:06 AM
Lisa Morgan
Lisa Morgan
Slideshows
Connect Directly
Twitter
RSS
E-Mail

7 Common Biases That Skew Big Data Results

Flawed data analysis leads to faulty conclusions and bad business outcomes. Beware of these seven types of bias that commonly challenge organizations' ability to make smart decisions.
6 of 8

Overfitting And Underfitting 
Overfitting involves an overly complex model that includes noise. Underfitting is the opposite; the model is overly simple. Either skews results.
'[Overfitting] is one of the most common (and worrisome) biases. It comes about from checking lots of different hypotheses in data. If each hypothesis you check has, say, a 1 in 20 chance of being a false positive, then if you check 20 different hypotheses, you're very likely to have a false positive occur at least once,' Greenberg said.
He tested the effect that various behavioral interventions had on site participants. At first, it appeared that a particular behavior outperformed the control. However, when a correction was applied that adjusted for the number of hypotheses tested, the statistical significance vanished.
'If you're using a high-dimensionality, non-linear predictive algorithm that has lots of degrees of freedom that allow you to sift data to the tee, you can essentially take any function and map it point-by-point so that the model does a tremendous job at looking at the data that you fitted on it. It's excellent on that, but if you go beyond that realm, it does awfully in terms of predicting data points that are outside the spectrum you looked up,' said CenturyLink's Schleicher. 'We inevitably split up our data sets into training data sets and testing data sets, and do cross validation across such multiple sets to make sure we don't overfit.'
(Image: PeteLinforth via Pixabay)

Overfitting And Underfitting

Overfitting involves an overly complex model that includes noise. Underfitting is the opposite; the model is overly simple. Either skews results.

"[Overfitting] is one of the most common (and worrisome) biases. It comes about from checking lots of different hypotheses in data. If each hypothesis you check has, say, a 1 in 20 chance of being a false positive, then if you check 20 different hypotheses, you're very likely to have a false positive occur at least once," Greenberg said.

He tested the effect that various behavioral interventions had on site participants. At first, it appeared that a particular behavior outperformed the control. However, when a correction was applied that adjusted for the number of hypotheses tested, the statistical significance vanished.

"If you're using a high-dimensionality, non-linear predictive algorithm that has lots of degrees of freedom that allow you to sift data to the tee, you can essentially take any function and map it point-by-point so that the model does a tremendous job at looking at the data that you fitted on it. It's excellent on that, but if you go beyond that realm, it does awfully in terms of predicting data points that are outside the spectrum you looked up," said CenturyLink's Schleicher. "We inevitably split up our data sets into training data sets and testing data sets, and do cross validation across such multiple sets to make sure we don't overfit."

(Image: PeteLinforth via Pixabay)

6 of 8
Comment  | 
Print  | 
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Slideshows
11 Things IT Professionals Wish They Knew Earlier in Their Careers
Lisa Morgan, Freelance Writer,  4/6/2021
News
Time to Shift Your Job Search Out of Neutral
Jessica Davis, Senior Editor, Enterprise Apps,  3/31/2021
Commentary
Does Identity Hinder Hybrid-Cloud and Multi-Cloud Adoption?
Joao-Pierre S. Ruth, Senior Writer,  4/1/2021
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Successful Strategies for Digital Transformation
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Slideshows
Flash Poll