5 Data Science Sins To Beware - InformationWeek
Data Management // Big Data Analytics
10:37 AM
Connect Directly

5 Data Science Sins To Beware

Repent, ye data scientists! Avoid these five big data evils -- or pay with your immortal soul.

OK, perhaps our fire-and-brimstone headline goes a bit overboard. Then again, maybe it is time for a dose of data science atonement, particularly if you're guilty of any of the five deadly sins summarized below.

According to Michael Walker, founder and president of the nonprofit Data Science Association, a professional organization of data scientists with more than 500 members, these big-data sins are all too common. In fact, the Association's recently penned Code of Professional Conduct is designed to establish a set of ethical standards for the burgeoning data-science industry.

Not all big-data professionals are guilty of the five deadly sins, of course, which Walker summarized in a phone interview with InformationWeek. So here they are. Do any of these data-science transgressions hit home?

Sin #1: Cherry Picking

This is where a data scientist includes only data that confirms a particular position and ignores evidence of a contradictory position. "I see this all the time," Walker said.

[ For more on ethical best practices for big-data professionals, see Data Scientists Create Code Of Professional Conduct. ]

Cherry picking is all too common in university research, according to Walker, who referenced a 2005 paper, "Why Most Published Research Findings are False," by Stanford professor John Ioannidis. "What [Ioannidis] argues, in a nutshell, is that the overwhelming majority of research that he reviewed could not be replicated," said Walker.

Here's a hypothetical scenario that illustrates cherry picking in action:

"[Researchers] create a hypothesis they want to test out," Walker said. "So they run it 999 times, and it fails. There's no evidence to confirm their hypothesis. Then they tweak it, run it again, and all of a sudden they find evidence to confirm their hypothesis." But when these same researchers publish a paper proclaiming their success, they don't mention the 999 times they failed. "I think that's very unethical," Walker said.

Sin #2: Confirmation Bias

This is where researchers favor data that confirms their hypothesis.

"When you're dealing with very large data sets, you're going to find more relationships, more correlations," said Walker. And that can lead to causation confusion, especially in high causal density environments.

1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 6, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll