5 Data Science Sins To Beware - InformationWeek
IoT
IoT
Data Management // Big Data Analytics
News
10/9/2013
10:37 AM
Connect Directly
Google+
RSS
E-Mail
50%
50%
RELATED EVENTS
Building Security for the IoT
Nov 09, 2017
In this webcast, experts discuss the most effective approaches to securing Internet-enabled system ...Read More>>

5 Data Science Sins To Beware

Repent, ye data scientists! Avoid these five big data evils -- or pay with your immortal soul.

OK, perhaps our fire-and-brimstone headline goes a bit overboard. Then again, maybe it is time for a dose of data science atonement, particularly if you're guilty of any of the five deadly sins summarized below.

According to Michael Walker, founder and president of the nonprofit Data Science Association, a professional organization of data scientists with more than 500 members, these big-data sins are all too common. In fact, the Association's recently penned Code of Professional Conduct is designed to establish a set of ethical standards for the burgeoning data-science industry.

Not all big-data professionals are guilty of the five deadly sins, of course, which Walker summarized in a phone interview with InformationWeek. So here they are. Do any of these data-science transgressions hit home?

Sin #1: Cherry Picking

This is where a data scientist includes only data that confirms a particular position and ignores evidence of a contradictory position. "I see this all the time," Walker said.

[ For more on ethical best practices for big-data professionals, see Data Scientists Create Code Of Professional Conduct. ]

Cherry picking is all too common in university research, according to Walker, who referenced a 2005 paper, "Why Most Published Research Findings are False," by Stanford professor John Ioannidis. "What [Ioannidis] argues, in a nutshell, is that the overwhelming majority of research that he reviewed could not be replicated," said Walker.

Here's a hypothetical scenario that illustrates cherry picking in action:

"[Researchers] create a hypothesis they want to test out," Walker said. "So they run it 999 times, and it fails. There's no evidence to confirm their hypothesis. Then they tweak it, run it again, and all of a sudden they find evidence to confirm their hypothesis." But when these same researchers publish a paper proclaiming their success, they don't mention the 999 times they failed. "I think that's very unethical," Walker said.

Sin #2: Confirmation Bias

This is where researchers favor data that confirms their hypothesis.

"When you're dealing with very large data sets, you're going to find more relationships, more correlations," said Walker. And that can lead to causation confusion, especially in high causal density environments.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Video
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll