Big Data Project Analyzes Veterans' Suicide Risk

In opt-in project, predictive analytics firm mines social media posts to predict suicidal risks of military veterans.

Jeff Bertolucci, Contributor

July 9, 2013

4 Min Read

5 Big Wishes For Big Data Deployments

5 Big Wishes For Big Data Deployments

5 Big Wishes For Big Data Deployments(click image for larger view and for slideshow)

Suicide rates among U.S. veterans are approximately twice that of the general population, a troublesome phenomenon that the U.S. Department of Veterans Affairs (VA) is struggling to fight. Mental health experts have yet to find a lasting solution, but one predictive analytics company has an innovative approach that might help: Use big data to analyze veterans' Facebook posts to spot suicide risk factors.

Phase 2 of the initiative, called "The Durkheim Project," was announced today by Patterns and Predictions, a predictive analytics company that uses machine learning technology built on big data software from Attivio and Cloudera.

The suicide risk prediction project includes a database of more than 100,000 U.S. veterans, all of whom are volunteers. Patterns and Predictions provides a Facebook app for iPhone and Android devices. It worked with Cloudera to co-develop real-time prediction software running on CDH, the latter's open-source Apache Hadoop distribution.

[ Want more on big data's varied uses? Read Big Data Reveals Weather-Related Shopping Patterns. ]

"With Cloudera's unique software expertise, we can focus on making The Durkheim Project's risk assessments faster across larger data sets," said Patterns and Predictions' founder Chris Poulin in a statement. Attivio's Active Intelligence Engine (AIE) technology helps integrate the many data sources required for the project.

"The Patterns and Predictions predictive model uses our engine to access and analyze all of the data and produce these predictions, or scores, related to the need for immediate mental health prioritization," said Attivio CTO Sid Probstein in a phone interview with InformationWeek. "It's entirely an opt-in system," he added. "When (veterans) opt in through Facebook, their social media posting history becomes available to the system."

The Durkheim Project complies with the U.S. Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, which regulates the disclosure of personal health information by health insurers, medical care providers, and other entities.

The information collected by the project "is retained in a HIPAA-compliant framework, so there's absolute security and privacy around this data," said Probstein. "It's available to the experts at the VA, and only in an observational model," he added. "It's not being used right now for real-time triage or for what I would call interventional actions. Intervention is still driven by the VA."

Probstein called The Durkheim Project "a good example of big data done right," one where data-collection efforts are out in the open. "We've heard a lot in the last couple of months about data-gathering on U.S. citizens," he said. "It's important to mention that this system works by analyzing data only when service people opt in."

Patterns and Predictions' founder Chris Poulin has been working with Dartmouth researchers since 2010 to address the problem of high suicide rates among veterans. His company, in conjunction with Attivio and Cloudera, won a Defense Advanced Research Projects Agency (DARPA) contract in 2011 to test its machine learning technology for the purpose of analyzing suicide risk.

In February 2013, an investigation conducted by Patterns and Predictions, Dartmouth and the VA determined that the accuracy of this risk-prediction data model was statistically significant, with "consistent accuracies" of 65% percent or higher in predicting suicide risk in a veteran control group.

The Durkheim Project's second phase, which is now underway, is studying suicide prediction at scale.

"The promise of Durkheim lies in its ability to collect and monitor a diverse repository of complex data, with the hope of eventually providing a real-time triage of interventional actions upon detection of a critical event," said Poulin. "Part of analysis across big data sets is to get better clinical outcomes," he told InformationWeek. "It may be that the system learns from one person who opts in, and is able to produce a better outcome for someone else."

To be effective, business technology pros gather information and interact with peers in a variety of ways. InformationWeek and its parent company, UBM Tech, are looking to discover what information you want and how you like to receive it, as well as your feelings on interactive communities, online content and live events. The results will help our editors develop products and services that best meet your needs. Take this survey and tell us how you like your tech content: Digital, live, opinionated? Tell us and enter to win a 32-GB Google Nexus 7 tablet.

About the Author(s)

Jeff Bertolucci


Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights