Courtagen taps AWS to handle the terabytes of data produced in analyzing individuals' genes to pinpoint disease-causing abnormalities and best treatments.

Charles Babcock, Editor at Large, Cloud

November 20, 2013

5 Min Read
Human genome analysis can create 1 TB of data.<br />(Source: Flickr user <a href="http://www.flickr.com/photos/johnjobby/2253777148/" target="new">JohnJobby</a>)

Brendan McKernan has atrial fibrillation, a condition that causes arrhythmia in the heart. He remembers being asked by his doctor which of 12 drugs they had tried he liked the best.

"I told him I felt terrible on all of them," he recalled, wondering why the doctor hadn't been able to prescribe a drug that was right for him. When it comes to finding just the right drug, often the only tool in the doctor's medicine bag is trial and error.

One drug eventually proved more effective than others, but McKernan, president of Courtagen Life Sciences, still wanted a tool that could advise a doctor regarding which treatment is best for the patient, out of several candidates with varying degrees of side effects.

For a spectrum of serious ailments known as mitochondrial diseases, the firm he co-founded is putting such a tool in the hands of doctors. Mitochondrial diseases are believed to result frequently from a genetic mutation. While some things are known about such mutations, it's only recently that individual genomic analysis has been available at what many deem a reasonable cost. Using genetic analysis in individual cases holds the promise of more precisely identifying both the mitochondrial disorder that is afflicting a patient and a possible treatment for it.

[Want to learn more about genetic analysis engines? See Big Data Startup Eyes Genome Analysis In 4 Hours.]

Mitochondrial diseases include Lou Gehrig's disease, Huntington's disease, cerebral palsy, Parkinson's disease, muscular dystrophy, possibly epilepsy, and probably Alzheimer's. They all have a variety of potential treatments, and part of the doctor's task is matching up the one that's best for an individual patient.

There are few things more individual than DNA. Courtagen is one of a growing number of independent labs that, for less than $1,000, will analyze a DNA sample that a client gives them.

Courtagen runs the tests in its labs. It's currently averaging 200 new customers a month, each one yielding a terabyte of DNA data. The data are shipped to the Amazon Web Services cloud, where they are referenced by the Courtagen Ziphyr bioinformatics pipeline, an aggregation and analysis engine. Courtagen doesn't have an IT staff. It has a department of bioinformatics. It developed Ziphyr and runs it on Amazon's EC2 to analyze and learn from incoming sequencing information.

With 1 TB of sequencing data in hand for one or several patients, Ziphyr identifies, as precisely as possible, the location and nature of the genetic irregularity or mutation that may be associated with a patient's disease. It then compares that individual data to that of similar patients.

Ziphyr has access to a database of 14,000 genetic hotspots or "coding non-synonymous variants," as McKernan put it. Quite a bit is known about some human genome variants, little about others.

Ziphyr works with the structured information flow of additional individual genetic information, and also an unstructured flow, in which it searches the Internet for information on cases, published research, and known treatments. "A lot of variants are not known" in the mitochondrial set of diseases, and different variants carry different levels of pathogenicity, McKernan said.

After analyzing each client's DNA and examining what's known about its variants, Ziphyr draws up a report for the patient's doctor that offers "a strong list of (mutation) suspects" that are causing the patient's condition, what's known about the identified mutations, what treatments exist (and which are most promising for the individual patient), outcomes in related cases, and what similar patients report on their treatments.

Such a report contains much better information than before DNA sequencing and Ziphyr analytics, claimed McKernan. Some of his colleagues believe they are in the advanced laboratory business. McKernan believes he's in the digital medical information business. "We're at the leading edge of what can be done. We are looking for the subtle inferences in the [patient's] data set that are missing in today's medical systems," he said.

Courtagen is a three-year-old company with strong analytics tools in a rapidly emerging field. It wants to maintain its lead and go global fast. It plans to do so even though it has yet to build a datacenter or establish something that looks like a traditional IT department. It uses NetSuite online business applications, while its bioinformatics staff develops the ways in which Courtagen uses genetic information through the Ziphyr platform.

The more clients whose genetic code it analyzes, the more valuable its database of decoded information becomes.

Instead of building datacenters for labs, "we believe we need to focus on our core competencies," said McKernan. "Even if we wanted to be number one at running a datacenter, we're not going to compete with Amazon," he noted.

"As our business expands, we will open labs around the world. We wanted a partner with a global presence," and one that could provide infrastructure that was HIPAA compliant.

Courtagen has developed its own algorithms to decide when to run its analytics jobs inside Amazon. It also uses off-peak spot pricing to get the best return on its computing costs. It delivers encrypted genetic data to its clusters in the Amazon Virtual Private Cloud over a private line, not the Internet. By using clusters in the Amazon Virtual Private Cloud, it's keeping its data-handling HIPAA-complaint.

By building its business out on Amazon, Courtagen believes the company can build a large bioinformatics pipeline. Information will stream in from more and more clients and information sources on the Web, and Ziphyr will send out better reports to doctors. If Courtagen can build its practice fast enough, it may become the premier adviser on which drug to use in the first place, rather than on which 12 might work.

Emerging software tools now make analytics feasible -- and cost-effective -- for most companies. Also in the Brave The Big Data Wave issue of InformationWeek: Have doubts about NoSQL consistency? Meet Kyle Kingsbury's Call Me Maybe project. (Free registration required.)

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights