Can Big Data Help Cure Cancer?
Genomic research and precision medicine are on the leading edge of cancer research and treatment, but the huge size of the data sets and the complexity of sharing those data sets among researchers thwarts progress. Here's one project that is aiming to remove those obstacles.
6 Ways Big Data Is Driving Personalized Medicine Revolution
6 Ways Big Data Is Driving Personalized Medicine Revolution (Click image for larger view and slideshow.)
Curing cancer seems like something that would happen at hospitals and not in computer rooms. But applying analytics to human DNA and the DNA of cancer cells is a promising frontier of cancer research that can help patients get the best treatment for the type of cancer they have, minimize the negative impact of that treatment on them, and ultimately save lives.
For Intel's Bryce Olson, it's a personal mission.
Olson is a prostate cancer patient and global marketing director of the Health and Life Sciences Group at chipmaker Intel. Together with the Knight Cancer Center Institute at Oregon Health & Science University, Intel is the company that has been driving the Collaborative Cancer Cloud project.
First announced in 2013, the project is based on the technology in Intel's Trusted Analytics Platform, or TAP. TAP is a stack of open source big data technologies, and in the case of the Collaborative Cancer Cloud, it also includes an open source genetics database.
It's Personal
Olson was diagnosed with stage 4 metastatic prostate cancer in spring 2014 when he was 45 years old.
"Cancer is a disease of the genome," Olson told InformationWeek in an interview, citing data from a recent British study. "If you were born after 1960, you have a 50-50 chance of being diagnosed with cancer some time in your lifetime. It's the No. 2 killer after cardiovascular disease. And because cancer is a disease of the genome, it's a perfect target for going after with a more data-driven approach. Precision medicine is being used to go after it."
Olson says he believes that in the future all cancer will be genetically sequenced. The mutations that cause cancer will be better investigated and understood, and treatment will be prescribed based on the specific mutations. But today, nobody is getting sequencing.
Unsatisfied with the treatment doctors had prescribed for his cancer -- hormonal drugs or chemotherapy -- Olson pursued personal genetic sequencing. He had his own genome sequenced, and he had his tumor sequenced, too. The results changed the course of his treatment.
Figure 1:
(Image: bernie_moto/iStockphoto)
"The mutations that were driving my cancer growth were being completely ignored," by the original prescribed treatment. Instead, armed with the knowledge of his genome, doctors prescribed a cocktail of other drugs in March 2015 that shut down his cancer's growth.
Intel's Collaborative Cancer Cloud project is making it easier for cancer researchers to pursue this approach, collaborate with each other, share data sets, and get to real treatments and maybe a cure for cancer sooner.
Paul Boutros, who holds a PhD in Medical BioPhysics, is one of the scientists in the trenches of this effort. He leads a team of computational biologists at the Ontario Institute for Cancer Research, which is leading one of the largest prostate cancer genomics research projects in the world.
His organization is working to figure out which men with prostate cancer need treatment, and what treatment would be best suited to each individual. The project enjoys a lot of international cooperation and works with enormous data sets -- trillions of data points.
That's where the promise is, and also one of the biggest challenges.
Cost of Data Generation Dropping
How do you work with such large data sets?
"The cost of data generation in genomics is dropping faster than Moore's Law," Boutros told InformationWeek in an interview. He said that while many of the organizations working with genomic data are able to stay on top of it or "ride the wave front," most groups are really struggling to keep up with it.
The other big challenge is collaborating with other organizations. If you could somehow combine your data set with the data sets of other organizations and query all the data for your research, that would reveal better, more accurate insights. But in order to include that other data, organizations need to navigate technology challenges and legal challenges.
Some organizations may view the data as part of their intellectual property. It's a big deal to share genomic data, Boutros said.
Each jurisdiction has a different set of legal concerns. Patients may have consented to share certain types of information, but not others. Or the patients may decide to withdraw consent at a later point in time, so that data that's been shared before cannot be shared later.
"There's a lot of care around these things, because it's incredibly intimate information for an individual," Boutros said. "It's your genome. It's information about who you are."
To address these issues, the Ontario Institute for Cancer Research was looking at solutions beyond its existing infrastructure of a couple thousand processors, 10,000 cores, and several petabytes of storage.
"The vast majority of the work was being done on local compute," he said. "We were quickly hitting the point where that was inviable."
[Find out more about Intel's open source big data efforts. Read Intel Updates Big Data Platform TAP.]
So, Boutros's team was looking at a couple different options. For one, it was investigating the possibility of creating an academic-only cloud system. It was also looking at the available commercial clouds. But there are many challenges with moving personal health information to commercial clouds, he said.
It was at this point in investigating potential solutions that a colleague at another institution told Boutros about the Collaborative Cancer Cloud.
The Collaborative Cancer Cloud enables researchers to get access to more data sets from other participating institutions by shipping the compute to the data. Olson said that organizations "Dockerize" or containerize the algorithms or applications and send them to the participating institutions to query their data. Then the results are shipped back to the original institution. The data itself is never moved from the institution where it resides.
Boutros's institution, the Ontario Institute for Cancer Research and the Dana-Farber Cancer Institute officially became part of the Collaborative Cancer Cloud in March 2016.
What's in the Box
For Boutros's team, that meant that Intel shipped some hardware -- Boutros said it was not much different from a rack with the appropriate central function, servers, and a router -- so that the Collaborative Cancer Cloud could run as a separate node, isolated from the organization's original HPC.
Boutros's team members wanted to run it separately, because they planned to put it through serious testing and they didn't want that to impact the organization's own local compute.
"We can really stress-test it and do things we wouldn't do on our own nodes," Boutros said. However, he noted, the system looks as if it will be easy to integrate with his
Continued on page 2
Continued from page 1
organizations' existing HPCs if his team ever decides to go down that road.
Participating in this Collaborative Cancer Cloud has facilitated two new collaborations for the Ontario Institute for Cancer Research, one with Dana Farber and the other with the Knight Cancer Institute at the Oregon Health and Science University.
It's also led to research in a new emerging area of machine learning. How do you create new machine learning techniques for a federated data model?
The project has enabled the Ontario Institute for Cancer Research to engage in a large-scale meta-analysis. Boutros said his organization is looking at a critical question in prostate cancer research: If a patient is diagnosed, should he be treated or not? Clinicians will make mistakes about a third of the time, Boutros said. Some men get therapy that doesn't give them any benefit, and some men don't get the therapy that they need.
Intel's Olson has gotten the treatment to stem his cancer's growth because of genetic sequencing and research. Collaborating takes that to the next level.
"Even if a single hospital sequenced every cancer patient who is on file with them, they'd only have 1% of the cancer population data, because cancer population data is so dispersed across the United States," Olson said.
"No single hospital is going to solve this. The only way you are going to advance the science is if you can figure out a way to enable these cancer institutions all over the United States and all over the world to collaborate with each other and share. Then you can get access to this big data pool."
About the Author
You May Also Like