One of the best selling prescription drugs in the world is Clopidogrel, mostly selling under the trade name Plavix. Clopidogrel is prescribed to most patients with risk of blood clots, and to all patients receiving a stent. The problem with Clopidogrel -- as with many other drugs -- is the variable effects in patients: It works well for some but in others has almost no effect. The usual approach of physicians is to increase the dose, until they find that the drug doesn't work at all and then try another one.
But now we know that a specific gene, CYP2C19, is responsible for metabolizing clot-dissolvent medication such as Clopidogrel. The impact is enormous. Now most patients who are candidates to receive a stent are tested for variants of the CYP2C19 gene, with physicians seeking alternative procedures for the ones who can't metabolize Clopidogrel.
In order to differentiate the variants of those genes, it is necessary to perform Genome-Wide Association Studies (GWAS) on a significant number of affected individuals, with full genome sequencing performed to identify the responsible genes. But, to map the results and cross-reference all genomic data, massive computational power is required. While the speed of genome sequencing has increased 1,000-fold in the past 10 years and the cost is approaching the $1,000 mark (the first genome took 13 years and cost $2.7 billion), cross-referencing all that information is still a huge challenge. A typical genome sequence of an individual yields around 3 million sequence variants compared with the reference genome.
This is where big data and cloud computing are indispensable tools for researchers. The combined power of virtual machines and the storage capacity can do the work much faster today than in 2006, when the first GWAS were performed.