Data Outliers: 10 Ways To Prevent Big Data Damage - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
4/18/2016
07:06 AM
Lisa Morgan
Lisa Morgan
Slideshows
Connect Directly
Twitter
RSS
E-Mail

Data Outliers: 10 Ways To Prevent Big Data Damage

Most business decision-makers aren't trained to understand data outliers, but they can learn the basics. Executives, managers, and employees without math degrees can ask smarter questions about analyses they're basing crucial judgments on. Here are some things to know.
5 of 11

Outliers Can Be More Or Less Common 

Most business professionals are aware of 'the bell curve,' or normal distribution, because they were taught something about it in high school or college. The concept is popular because it applies to many things in everyday life and in business, such as the ambient temperature range of certain equipment.  
Bell curves tend to appear whenever a variable is the result of various influences added together for a result, for instance the sum of the effects of many genes that each adjust one's height a little bit. In a normal, bell-shaped distribution, the majority of a population clusters towards the middle. For example, the average height of an adult human male is 5'10'. (Sixty-eight percent of males are between 5'7' and 6'0' tall.) 
'The mean tells you about the center of the data. The standard deviation tells you how wide the data is or the scale of the data, but it's extremely sensitive to outliers,' said Greenberg. 'You have to be sure that you've carefully looked at your data, and you know that the result isn't massively affected by just one data point.' 
(Image: OpenClipartVectors via Pixabay)

Outliers Can Be More Or Less Common

Most business professionals are aware of "the bell curve," or normal distribution, because they were taught something about it in high school or college. The concept is popular because it applies to many things in everyday life and in business, such as the ambient temperature range of certain equipment.

Bell curves tend to appear whenever a variable is the result of various influences added together for a result, for instance the sum of the effects of many genes that each adjust one's height a little bit. In a normal, bell-shaped distribution, the majority of a population clusters towards the middle. For example, the average height of an adult human male is 5'10". (Sixty-eight percent of males are between 5'7" and 6'0" tall.)

"The mean tells you about the center of the data. The standard deviation tells you how wide the data is or the scale of the data, but it's extremely sensitive to outliers," said Greenberg. "You have to be sure that you've carefully looked at your data, and you know that the result isn't massively affected by just one data point."

(Image: OpenClipartVectors via Pixabay)

5 of 11
Comment  | 
Print  | 
News
Python Beats R and SAS in Analytics Tool Survey
Jessica Davis, Senior Editor, Enterprise Apps,  9/3/2019
Slideshows
IT Careers: 10 Places to Look for Great Developers
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/4/2019
Commentary
Cloud 2.0: A New Era for Public Cloud
Crystal Bedell, Technology Writer,  9/1/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll