Data Outliers: 10 Ways To Prevent Big Data Damage - InformationWeek
Data Management // Big Data Analytics
07:06 AM
Lisa Morgan
Lisa Morgan
Ransomware: Latest Developments & How to Defend Against Them
Nov 01, 2017
Ransomware is one of the fastest growing types of malware, and new breeds that escalate quickly ar ...Read More>>

Data Outliers: 10 Ways To Prevent Big Data Damage

Most business decision-makers aren't trained to understand data outliers, but they can learn the basics. Executives, managers, and employees without math degrees can ask smarter questions about analyses they're basing crucial judgments on. Here are some things to know.
1 of 11

(Image: roegger via Pixabay)

(Image: roegger via Pixabay)

Data analytics has its own vocabulary that business decision-makers are under pressure to learn. Beware, though, because technical terms are often used loosely, sometimes to the detriment of individuals and their companies. An outlier is a good example. A lot of people are talking about outliers, but not a lot of people understand why they exist, what causes them, and what should be done with them, if anything.

"An outlier is a member of a defined dataset which has a dramatically different value than the other members of the set. It can be the result of measurement or recording errors, or the unintended and truthful outcome resulting from the set's definition," said Tom Bodenberg, chief economist and data consultant at market research firm Unity Marketing in an interview.

Outliers make their way into reported statistics every day. Sometimes their inclusion or exclusion is obvious, and sometimes it isn't. For example, in 1984 the University of Virginia reported that the average starting salary of Rhetoric and Communications graduates was $55,000. However, an outlier was skewing the analysis. The dataset included one hundred graduates with $25,000 salaries and NBA first draft pick Ralph Sampson, another graduate. His starting salary exceeded $1 million.

Outliers can pop up for different reasons. Some are caused by mistakes made by humans or machines. Others represent actual data. Most business professionals haven't considered the difference, and they have no idea what to do with them.

One tactic is to include outliers in a dataset or exclude outliers from a dataset as a matter a course, without considering the potential consequences. While it's true that the inclusion or removal of outliers may have little or no effect on an analysis, the opposite may be true.

Learn to integrate the cloud into legacy systems and new initiatives. Attend the Cloud Connect Track at Interop Las Vegas, May 2-6. Register now!

"If you're working with data, or other people are giving you results based on data, it's useful to consider how outliers are detected and handled, and what you can learn from them," said Spencer Greenberg, mathematician and founder of decision-making tool provider, in an interview. "Important questions to ask are, 'Were there outliers in the data? Why did they occur? What can we learn from them?' And 'How were they dealt with?'"

Some organizations analyze outliers to detect such things as fraudulent transactions, criminal activity, security breaches, and disease outbreaks. In fact, outliers can sometimes tell interesting stories that might not otherwise have been considered.

"Anyone who is trying to interpret data needs to care about outliers. It doesn't matter if the data is financial data, sociological data, medical data, or even qualitative data like a relationship. Any analysis of data or information must consider the presence and effect of outliers," said Sham Mustafa, founder and CEO of data scientist marketplace Correlation One, in an interview.

Some outliers are easy to spot. Others are more difficult. Here are a few things to consider.

Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include ... View Full Bio

1 of 11
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll