Big data raises serious privacy concerns that need to be addressed, sooner rather than later, report says.

Jeff Bertolucci, Contributor

May 5, 2014

4 Min Read
(Source: <a href="" target="blank">Philippe Teuwen</a>)

16 Top Big Data Analytics Platforms

16 Top Big Data Analytics Platforms

16 Top Big Data Analytics Platforms (Click image for larger view and slideshow.)

Big data's potential is enormous -- for good and bad. A new report from the White House on big data's transformative qualities takes a deep dive into data-related privacy and security issues.

The key takeaway: Big data is creating numerous privacy issues that need to be addressed, sooner rather than later. 

"A significant finding of this report is that big data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education, and the marketplace," the report's introduction states. "Americans' relationship with data should expand, not diminish, their opportunities and potential."

The report discusses a variety of privacy topics, including these five:  

1. De-identification doesn't always work.
Organizations often use privacy-protection technology to "de-identify" data linked to a specific person or device. Unfortunately, re-identification techniques are just as effective at piecing the link together again.

[What's next in big data? Read 3 Trends Driving Big Data Breakthroughs: A CIO's View.]

The report states: "…integrating diverse data can lead to what some analysts call the 'mosaic effect,' whereby personally identifiable information can be derived or inferred from datasets that do not even include personal identifiers, bringing into focus a picture of who an individual is and what he or she likes."

Figure 1: (Source: Philippe Teuwen)
(Source: Philippe Teuwen)

As technologies to re-identify "anonymous" data grow more powerful, it's unclear how individuals will control their information and identities, or challenge decisions based on information culled from multiple datasets.

2. "Perfect personalization" could aid discrimination.
The fusion of different types of unstructured data allows marketers to "deliver exactly the right message, product, or service to consumers before they even ask," the report says. "Unfortunately, 'perfect personalization' also leaves room for subtle and not-so-subtle forms of discrimination in pricing, services, and opportunities."  

3. "Small" data poses a bigger privacy threat.
Despite all the talk about big data's potential for personal infringement, the most common privacy risks today involve "small data," such as when hackers target personal banking information to commit financial fraud. "These risks do not involve especially large volumes, rapid velocities, or great varieties of information, nor do they implicate the kind of sophisticated analytics associated with big data," says the report.

Protection of small data already has been addressed by US privacy laws, "robust" enforcement, and global privacy mechanisms, the report claims. Although that might be true, the recent Target security breech and Heartbleed bug show there's plenty of room for improvement in this area.

4. Predictive medicine could lead to privacy pandemonium.
One promising big data application is predictive medicine, which delves deeply into patients' health and genetic information to predict if they'll develop a particular disease, and how well they'll respond to specific therapies. The potential for abuse here is huge. For instance, health information collected via predictive medicine might be applied to decisions involving people with similar genes, such as a patient's children.

"The privacy frameworks that currently cover information now used in health may not be well suited to address these developments or facilitate the research that drives them," the report states.

5. Conversely, privacy laws hinder some important analytics.  
"Big data analytics enable data scientists to amass lots of data, including unstructured data, and find anomalies and patterns," the report says. "A key privacy challenge in this model of discovery is that in order to find the needle, you have to have a haystack. To obtain certain insights, you need a certain quantity of data."

Hence the problem: Researchers can benefit from access to larger data sets of sensitive genetic information, but privacy laws limit their access to this data. A genetic researcher at the Broad Institute, for instance, was not able to detect a genetic variant related to schizophrenia with 3,500 genetic datasets, but achieved "statistically significant" results with 35,000 cases, the report says.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)

About the Author(s)

Jeff Bertolucci


Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights