White House Big Data Report: 5 Privacy Takeaways

Big data raises serious privacy concerns that need to be addressed, sooner rather than later, report says.

3. "Small" data poses a bigger privacy threat.
Despite all the talk about big data's potential for personal infringement, the most common privacy risks today involve "small data," such as when hackers target personal banking information to commit financial fraud. "These risks do not involve especially large volumes, rapid velocities, or great varieties of information, nor do they implicate the kind of sophisticated analytics associated with big data," says the report.

Protection of small data already has been addressed by US privacy laws, "robust" enforcement, and global privacy mechanisms, the report claims. Although that might be true, the recent Target security breech and Heartbleed bug show there's plenty of room for improvement in this area.

4. Predictive medicine could lead to privacy pandemonium.
One promising big data application is predictive medicine, which delves deeply into patients' health and genetic information to predict if they'll develop a particular disease, and how well they'll respond to specific therapies. The potential for abuse here is huge. For instance, health information collected via predictive medicine might be applied to decisions involving people with similar genes, such as a patient's children.

"The privacy frameworks that currently cover information now used in health may not be well suited to address these developments or facilitate the research that drives them," the report states.

5. Conversely, privacy laws hinder some important analytics.  
"Big data analytics enable data scientists to amass lots of data, including unstructured data, and find anomalies and patterns," the report says. "A key privacy challenge in this model of discovery is that in order to find the needle, you have to have a haystack. To obtain certain insights, you need a certain quantity of data."

Hence the problem: Researchers can benefit from access to larger data sets of sensitive genetic information, but privacy laws limit their access to this data. A genetic researcher at the Broad Institute, for instance, was not able to detect a genetic variant related to schizophrenia with 3,500 genetic datasets, but achieved "statistically significant" results with 35,000 cases, the report says.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)