Genomic research and precision medicine are on the leading edge of cancer research and treatment, but the huge size of the data sets and the complexity of sharing those data sets among researchers thwarts progress. Here's one project that is aiming to remove those obstacles.
6 Ways Big Data Is Driving Personalized Medicine Revolution
(Click image for larger view and slideshow.)
Curing cancer seems like something that would happen at hospitals and not in computer rooms. But applying analytics to human DNA and the DNA of cancer cells is a promising frontier of cancer research that can help patients get the best treatment for the type of cancer they have, minimize the negative impact of that treatment on them, and ultimately save lives.
For Intel's Bryce Olson, it's a personal mission.
Olson is a prostate cancer patient and global marketing director of the Health and Life Sciences Group at chipmaker Intel. Together with the Knight Cancer Center Institute at Oregon Health & Science University, Intel is the company that has been driving the Collaborative Cancer Cloud project.
First announced in 2013, the project is based on the technology in Intel's Trusted Analytics Platform, or TAP. TAP is a stack of open source big data technologies, and in the case of the Collaborative Cancer Cloud, it also includes an open source genetics database.
Olson was diagnosed with stage 4 metastatic prostate cancer in spring 2014 when he was 45 years old.
"Cancer is a disease of the genome," Olson told InformationWeek in an interview, citing data from a recent British study. "If you were born after 1960, you have a 50-50 chance of being diagnosed with cancer some time in your lifetime. It's the No. 2 killer after cardiovascular disease. And because cancer is a disease of the genome, it's a perfect target for going after with a more data-driven approach. Precision medicine is being used to go after it."
Olson says he believes that in the future all cancer will be genetically sequenced. The mutations that cause cancer will be better investigated and understood, and treatment will be prescribed based on the specific mutations. But today, nobody is getting sequencing.
Unsatisfied with the treatment doctors had prescribed for his cancer -- hormonal drugs or chemotherapy -- Olson pursued personal genetic sequencing. He had his own genome sequenced, and he had his tumor sequenced, too. The results changed the course of his treatment.
"The mutations that were driving my cancer growth were being completely ignored," by the original prescribed treatment. Instead, armed with the knowledge of his genome, doctors prescribed a cocktail of other drugs in March 2015 that shut down his cancer's growth.
Intel's Collaborative Cancer Cloud project is making it easier for cancer researchers to pursue this approach, collaborate with each other, share data sets, and get to real treatments and maybe a cure for cancer sooner.
Paul Boutros, who holds a PhD in Medical BioPhysics, is one of the scientists in the trenches of this effort. He leads a team of computational biologists at the Ontario Institute for Cancer Research, which is leading one of the largest prostate cancer genomics research projects in the world.
His organization is working to figure out which men with prostate cancer need treatment, and what treatment would be best suited to each individual. The project enjoys a lot of international cooperation and works with enormous data sets -- trillions of data points.
That's where the promise is, and also one of the biggest challenges.
Cost of Data Generation Dropping
How do you work with such large data sets?
"The cost of data generation in genomics is dropping faster than Moore's Law," Boutros told InformationWeek in an interview. He said that while many of the organizations working with genomic data are able to stay on top of it or "ride the wave front," most groups are really struggling to keep up with it.
The other big challenge is collaborating with other organizations. If you could somehow combine your data set with the data sets of other organizations and query all the data for your research, that would reveal better, more accurate insights. But in order to include that other data, organizations need to navigate technology challenges and legal challenges.
Each jurisdiction has a different set of legal concerns. Patients may have consented to share certain types of information, but not others. Or the patients may decide to withdraw consent at a later point in time, so that data that's been shared before cannot be shared later.
"There's a lot of care around these things, because it's incredibly intimate information for an individual," Boutros said. "It's your genome. It's information about who you are."
To address these issues, the Ontario Institute for Cancer Research was looking at solutions beyond its existing infrastructure of a couple thousand processors, 10,000 cores, and several petabytes of storage.
"The vast majority of the work was being done on local compute," he said. "We were quickly hitting the point where that was inviable."
So, Boutros's team was looking at a couple different options. For one, it was investigating the possibility of creating an academic-only cloud system. It was also looking at the available commercial clouds. But there are many challenges with moving personal health information to commercial clouds, he said.
It was at this point in investigating potential solutions that a colleague at another institution told Boutros about the Collaborative Cancer Cloud.
The Collaborative Cancer Cloud enables researchers to get access to more data sets from other participating institutions by shipping the compute to the data. Olson said that organizations "Dockerize" or containerize the algorithms or applications and send them to the participating institutions to query their data. Then the results are shipped back to the original institution. The data itself is never moved from the institution where it resides.
Boutros's institution, the Ontario Institute for Cancer Research and the Dana-Farber Cancer Institute officially became part of the Collaborative Cancer Cloud in March 2016.
What's in the Box
For Boutros's team, that meant that Intel shipped some hardware -- Boutros said it was not much different from a rack with the appropriate central function, servers, and a router -- so that the Collaborative Cancer Cloud could run as a separate node, isolated from the organization's original HPC.
Boutros's team members wanted to run it separately, because they planned to put it through serious testing and they didn't want that to impact the organization's own local compute.
"We can really stress-test it and do things we wouldn't do on our own nodes," Boutros said. However, he noted, the system looks as if it will be easy to integrate with his
Continued on page 2
Jessica Davis has spent a career covering the intersection of business and technology at titles including IDG's Infoworld, Ziff Davis Enterprise's eWeek and Channel Insider, and Penton Technology's MSPmentor. She's passionate about the practical use of business intelligence, ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.