Illumina is a firm that provides advanced, next-generation DNA sequencing machines to the labs, clinics, and research institutes that need them. Because of that, Alex Dickinson, its senior vice president of strategic initiatives, was an Amazon Web Services customer invited to the stage during the opening keynote at Amazon Summit 2016 in Santa Clara July 13.
DNA sequencing is being used to do many things, from reconfiguring rice crops to meet changing climate conditions to producing plants with hard-to-obtain characteristics. But in few areas is it having as much impact as it has on human cancer treatment.
Illumina sells machines that cost from $10 million to $20 million to customers.
From its customers, the company gets back data representing the genetic information that's been derived from the sequencers. But the sequencers have to break up the physical DNA to sequence it. Illumina reassembles the DNA segments via the data in its applications running on Amazon Web Services.
A process that used to take weeks or months and cost hundreds of thousands of dollars is done in a next-gen sequencer in a massively parallel fashion. The time to decipher an individual's genome has shrunk from a former best time of 48 hours to 24 hours, Dickinson said in an interview with InformationWeek after his on-stage talk.
That speed of execution is part of the reason the cost of individual genome has shrunk to about $1,000.
Increasingly, the fast sequencing is being used to break down the genome of an individual's cancer tumor. That information is then being compared to that of healthy cells. The variations or mutations on the tumor's genes provide vital clues about what kind of approach will be most effective in treating the cancer.
"Most cancers exhibit a mutation in gene P53," Dickinson notes, the human gene for error-checking on a cell's reproductive process. But which mutation is it, out of many possible? A DNA molecule consists of 3 billion bases, or chains consisting of four nucleotides in a row. It's the task of Illumina software to detect which nucleotides have occurred out of order.
The sequencer machines produce data on 200 bases at a time and send it to the cloud, until a gene has been fully broken down into all its bases. The cloud application can use a human template, a correct order for a healthy gene, to serve as a guide and reconstruct the chain.
The problem is more complicated than linking together sets of 200, however. Each set must have an overlap of 30 nucleotides with some other set on each end, so the task of getting 3 billion lined up correctly remains an arduous and processing-intensive task.
Once completed, the mutations can be spotted in the chain and analyzed by other software.
Without the EC2 cloud, Illumina would have had to leave the data in the customers' hands to process as best they could on their own, Dickinson said. But that would have represented an incalculable fragmentation of the knowledge base derived from the sequencing.
"Ninety per cent of the DNA sequenced has been sequenced on Illumina machines," he said in the interview, with the results aggregated on Illumina BaseSpace Sequence Hub on EC2. BaseSpace has accumulated 10PB of genetic data and is growing fast, Dickinson said.
Illumina has so much data that it has periodically considered moving some of it out of Amazon S3 storage and into the newer, less frequently accessed form of storage, Amazon Glacier. Strong customer demand, however, has kept Illumina executives focused on producing and shipping better sequencing machines, not tinkering with the cloud. The firm had revenues of $2.2 billion in 2015, up 19%.
"We keep talking about it but we haven't got around to it," Dickinson said. As the company devoted attention to the migration several months ago, AWS lowered the prices on S3 storage enough to take away its motivation to execute the move.
AWS came up with an EC2 instance that was a good fit for tackling one genome sequencing problem at a time. Dickinson said that the instance that delivers 16 CPU and 100GB of memory was right-sized to take on an individual sequencing problem, because one individual's DNA will produce about 100GB of data. This sounds suspiciously like the memory-intensive instance, R3 4X large, which comes with 16 virtual CPUs and 122 Gibibytes of storage, or a little less than 122GB. Amazon insists on offering its storage measures in GiBs.
In addition to BaseSpace, Illumina is constantly collecting data on how well the lasers in its sequencers are functioning, as well as the conditions under which the chemicals the sequencers use have been handled en route to the sequencer customer's site.
"We need to know the temperatures the chemical cartridges experienced on that flight. Did they fall out of spec? We monitor the supply chain. We need to know we're getting good genetic data," said Dickinson in the interview.
Illumina collects 270 billion data points on its equipment, and in the supply chain that feeds its operation each year and stores them in S3.
Illumina has few direct dealings with the doctors, clinicians, and researchers making use of Illumina-produced data. Dickinson adds, however, "From the articles I've seen, in a majority of the cases where a tumor has been sequenced, the doctors will say it impacted or altered the treatment."
Cancers have previously been described by the organ or location in the body where they are first detected. But research is showing that gene defects that gave rise to cancer may cause them in multiple locations. By studying what cancers are associated with what genes in many different individuals, then relating that information to what is known about effective treatments, doctors increase the likelihood of successfully battling the disease.
[Want to see more customer feedback on S3 storage? Read AWS S3, data transfer among its most popular services: Report.]
"They can pick a treatment based on the mutation, not the location," said Dickinson.
As BaseSpace keeps accumulating genetic information, including that of cancer patients, researchers can comb through it looking for suspected patterns or hitherto unsuspected associations. Because the data comes from many individuals, "it will have a huge impact on both how cancer is detected and how it's treated in years to come," Dickinson told the 8,000 attendees at the Santa Clara Convention Center on July 13.
Its good news to AWS to know that a gene sequencing firm is gobbling up processing power in its cloud and becoming a giant storage customer. But Matt Wood, AWS general manager for products, didn't talk about that when asked about Illumina as a customer. He said instead: "Watching genomic data move closer to clinical data" on the Amazon cloud, and affect outcomes "is incredibly humbling."