A sizable majority of data scientists believe consumers should worry about the privacy implications of big data, the personal information collected on them, and how this data is used, according to a recent survey of statisticians attending the JSM (Joint Statistics Meetings) Conference in Montreal.
Revolution Analytics, a Palo Alto, Calif.-based software developer and major proponent of the open source R programming language, conducted the poll of conference attendees in early August. Some 865 respondents offered their views on privacy and ethics in data collection and on the statistical software used to analyze this information.
The survey's findings show that data scientists are clearly concerned about the impact of data collection on personal privacy. Overall, 88% of respondents said consumers should worry about privacy issues in the big data era, as more organizations stockpile personal -- and often sensitive -- information on all of us.
David Smith, VP of corporate marketing for Revolution Analytics, summarized the survey results in a recent blog post. Earlier this year, Forbes selected Smith as one of the top 20 influencers in the big data sector.
[ Will Facebook be allowed to proceed with its planned privacy changes? Read Facebook Privacy Changes: FTC Steps In. ]
"It's always important for consumers to understand what information companies have about them, and how it's being used," Smith told InformationWeek in a phone interview. "Speaking as a data scientist, statisticians and data scientists are uniquely positioned to understand the privacy implications of data, especially the implications of combining lots of different data sources together, which is happening a lot today."
Four of five respondents said there should be an ethical framework for collecting and using data. In fact, some business sectors already have such frameworks in place. More than half of respondents agreed that ethics play a significant role in their data research.
"For example, in the pharmaceutical industry collecting data around clinical trials is very well regulated, and there are ethical frameworks in place for how data is collected and used," said Smith. "But there's not an industry-wide framework for doing so."
Data scientists working in the healthcare and life science fields showed the greatest support (92%) for a code of ethics. "Out of all the industries, life science and healthcare is the one, I think, that is most advanced in setting up frameworks for using data in an ethical way," Smith said.
Overall, the survey results didn't surprise Smith. "It's really a confirmation of what we would have expected to see," he noted. "These are people who have their hands working with data in and out every day. They understand the power of data and the importance of it being used in an appropriate fashion."
It would appear that data scientists in general have a strong code of ethics. Just 10% of survey respondents said there should not be an ethical framework for data research, with 1% agreeing that ethics should not play a role in data science.
"You can put data to very powerful and good uses, and you can put data to nefarious uses," said Smith. "And that's something that statisticians and data scientists recognize."
He added that it's important for consumers to understand what data companies (and government agencies) are collecting on them, and how it's being used. But as the ongoing NSA controversies show, there's plenty of room for improvement here.
Another example: Facebook's new privacy policies are being criticized by privacy groups, which claim the social network will be able to use personal data in ads without compensating its members.
Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)