Michael J. Fox Foundation Points Big Data At Parkinson's
Kaggle Challenge winner Lionsolver proves that passive data collection can improve patient care and medical research.
Health care providers and medical researchers typically rely on laborious, hands-on data collection and analysis approaches. But in a Kaggle competition recently set up to study Parkinson's disease, the Michael J. Fox Foundation learned that it's possible to rely on passive approaches to data collection and analysis.
Passive data collection is ideal for many Parkinson's patients because they often struggle with a roller coaster of day-to-day symptoms such as debilitating tremors that can make it difficult to keep accurate diaries about their condition. Yet having accurate information about symptoms is vital for doctors and care givers in order to advise patients on drug dosage and diet and sleeping habits.
Hoping to overcome the diary problem, entrepreneur Daniel Vannoni of Gecko Ventures had the idea of creating a mobile app that would take advantage of smart phone sensors including voice, accelerometer and GPS to passively monitor patients without their having to input data and interact with the application. With the trial mobile app completed, Vannoni set up test in which nine Parkinson's patients and seven control patients wore smart phones running the app for four to five hours per day for eight weeks.
Vannoni ultimately put his project on hold for lack of a supporting business model, but he donated the data from the trail to the Michael J. Fox Foundation For Parkinson's Research. The Fox Foundation typically facilitates research with supporting funding and by connecting patients with drug trials through its Fox Trail Finder site.
Lacking in-house researchers and data-analysis expertise, the Foundation decided to work with Kaggle, which organizes crowd-sourced big data analysis competitions. Kaggle has attracted a community of more than 80,000 data analysts of every description, and contestants have solved more than 200 challenges -- many with stunning success.
The Fox Foundation put up $10,000 prize for the Parkinson's Data Challenge. The requirements were to, one, use the data to separate Parkinson's patients from the control group and, two, track increases and decreases in Parkinson's symptoms suffered by each patient. Another objective for the Fox Foundation was simply to gauge the success of the crowd-sourcing model, as it's used to promoting grants and issuing requests for applications (RFAs) for grants as a way to drive research.
"There were more than 600 downloads of the data and 29 submissions, so we were blown away by the response," says Laxmi Wordham, Fox Foundation's chief digital officer. "We really didn't know what to expect, but the number of submissions, the diversity of the teams and the depth of the analysis was very impressive."
The winning team, machine learning specialist Lionsolver, had no prior experience with Parkinson's disease, so it consulted with neuromuscular specialists from Cedars-Sinai Medical Center in Los Angeles to learn more about the disease.
"Cedars helped us understand what Parkinson's patient encounter and what kind of symptoms might manifest themselves in the data so we could use Lionsolver software to build training sets," Lionsolver co-founder and CEO Drake Pruitt told InformationWeek. "With that insight we could challenge the computer to identify relevant versus nonrelevant data based on how that data maps to symptoms."
Mobile apps used by thousands of patients certainly have the potential to generate big data, but Vannoni's test data set was tiny -- so small, in fact, that the voice and GPS data had limited value. Research has shown that voice patterns fluctuate along with Parkinson's symptoms because speech is a neuromuscular activity. Movement indicated by GPS is also potentially valuable because it shows when patients are home bound versus when they are able to walk and travel, the latter being a good indication that they are mobile and less likely to suffering debilitating symptoms.
The one data set that was useful to Lionsolver was the accelerometer information. Using its machine learning software, Lionsolver was able to use to detect Parkinson's tremors and relative severity of those tremors.
"We built a training set that enabled us to distinguish frequencies of movement and that compared what's normal to what's extraordinary," Pruitt explained. "We essentially built a cluster, and that cluster was extremely valuable once we figured out the frequencies of movement for Parkinson's patients."
With more data, Pruitt suspects passively collected voice and GPS information might also be helpful in monitoring Parkinson's patients and advising them on drug dosage and eating and sleeping habits. The key takeaway, though, is that passive data collection has clear value, and it's a lesson that has been learned in connection with other diseases.
Asthmapolis, a Madison, Wisconsin-based company, has developed a GPS- and Bluetooth-enabled inhaler and companion smartphone app that precisely logs when and where asthma patients administer their medication. The patient simply uses the inhaler as needed. The app makes it easy for patients and doctors to track the use of medication while healthcare providers and municipalities see the bigger picture: precisely mapped data on breathing incidents.
Asthmapolis data can be used in patient-doctor consultations, and it is also anonymously aggregated and mashed up with weather data, crop data and pollution data, for example, to spot geospatial patterns and sources of asthma-triggering sources and events impacting thousands of patients. In one recent test, Asthmapolis distributed 500 inhalers throughout Louisville, Kentucky, to learn more about why the city has an unusually high number of Asthma sufferers.
With mobile sensing devices becoming ever more affordable, there's huge promise for passive data collection. From a data-analysis perspective, the other "hands off" lesson learned is the value of machine learning techniques, wherein the data itself determines what data is significant rather than testing analyst theories with data mining techniques.
"We took on the challenge so we could showcase how you can learn from the data," said Pruitt. "In this case a very small set of data was able to predict with 100% accuracy to the standards of the competition," meaning the software showed clusters of normal patients, clusters of Parkinson's patients and the trend in each Parkinson's patient symptoms over time.
Human analysts bring biases and theories to problems. It's a proven approach and it works fine so long as the analysts use the data to prove their theories. Machine learning techniques are valuable in that they use the data itself to spot significant patterns and relationships that analysts might never have suspected and would otherwise been missed.
As electronic medical records become the norm, researchers across many areas of medical research are hoping that the availability of vast data sets and the use of big data techniques like machine learning will unlock new discoveries that might lead to cures.
Items from pills to power plants will soon generate billions of data points. How will this movement change your industry? Also in the new, all-digital Here Comes The Internet Of Things issue of InformationWeek: How IT can capitalize on the NSA's big data prowess. (Free registration required.)
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
InformationWeek Tech Digest August 03, 2015The networking industry agrees that software-defined networking is the way of the future. So where are all the deployments? We take a look at where SDN is being deployed and what's getting in the way of deployments.