informa
/
3 min read
article

Data Scientists Want Big Data Ethics Standards

Nearly half of data scientists surveyed last month say Facebook's controversial "mood manipulation study" was unethical, and many support ethics guidelines for big data research.
10 Big Data Online Courses
10 Big Data Online Courses
(Click image for larger view and slideshow.)

The vast majority of statisticians and data scientists believe that consumers should worry about privacy issues related to data being collected on them, and most have qualms about the questionable ethics behind Facebook's undisclosed psychological experiment on its users in 2012.

Those are just two of the findings from a Revolution Analytics survey of 144 data scientists at JSM (Joint Statistical Meetings) 2014, an annual gathering of statisticians, to gauge their thoughts on big data ethics. The Boston conference ran Aug. 2-7.

The survey results show data scientists are largely a principled bunch concerned over the lack of ethical guidelines for big data research, at least in some industries.

The Facebook study is a case in point. In January 2012, the social network placed positive or negative posts and images in nearly 700,000 of its users' news feeds to gauge whether the information would sway people's emotions. The Facebook users were unaware they were subjects in the study.

[New sources of data raise new privacy issues. Read Mining WiFi Data: Retail Privacy Pitfalls]

The JSM survey found that 47% of respondents found the Facebook study unethical; another 40% said they "don't know" if the mood manipulation study was ethical.

Big data researchers can glean an important lesson from the Facebook study and the criticism it received, said David Smith, chief community officer at Revolution Analytics, one of leading commercial providers of software and services based on the open-source R programming language. Smith is responsible for developing relationships with the statistician and data scientist community that uses and develops R.

In a phone interview with InformationWeek, Smith said data scientists and statisticians working in the scientific and health science fields already have "a lot of regulation around how data is collected and analyzed."

One example involves medical research conducted for the US Department of Health and Human Services' National Institutes of Health (NIH). "If you want to run a study, say, a psychological study through the NIH with actual patients or human subjects, you need to go through an ethics review before you go ahead and do that," said Smith.

In the tech industry, however, big data ethical guidelines are far more opaque.

"I think what's interesting about the Facebook [study] is that there's this whole new Wild West, if you like, of data coming from Internet applications, Internet services, the Internet of Things, where these practices and procedures aren't really in place yet," said Smith.

When asked if there should be an ethical framework for collecting and using data, 42% of JSM survey respondents agreed that an industry standard should be in place, while 43% said that ethics already plays "a big part" in their research.

If people feel there isn't an ethical standard in place for data collection and analysis, "then naturally they should worry about privacy issues associated with that data," Smith said. "Statisticians and data scientists have an important role to play in the practices and standards around handling and analyzing data in the world at large. I think the Facebook example should teach us a lesson, and my hope is that web and technology companies will involve data scientists more in analyzing the data that they collect."

Do you need a deeper leadership bench? Send your most promising leaders to our InformationWeek Leadership Summit, Sept. 30 in New York City, for a day of peer learning and strategic speakers.