In a gamble against time, Amazon Web Services is making the genome of 3,024 varieties of rice publicly available in the cloud so that researchers can use the data to come up with the strains most likely to prove resistant to heat, drought, and disease.
Rice makes up 20% of the calories consumed in the world, but yields are going down in many of the rice-growing regions of the world, wherever temperatures are above average during the crop's growing season.
Researcher Jan Leach says warmer temperatures also make the diseases that impact rice more virulent, at the same time the plants' inherent resistance to disease appears to be compromised. If warmer temperatures are here to stay, then the nature of the world's rice crop must evolve quickly to meet changing conditions and avoid widespread hunger, she said in an interview.
"You get more diseases at high temperatures and the plant's inherent resistance is not as effective," she said from her office at Colorado State University in Fort Collins, Colo., where she is a distinguished professor in plant research.
[Want to learn more about Amazon Web Services' growing cloud strength? See Amazon's Profitable Q2: Is It Bigger Than Wal-Mart?]
A coalition of researchers, including the International Rice Research Institute based in the Philippines and the Chinese Academy of Agricultural Sciences in Shenzhen, is behind an effort to keep the world's rice fields productive. Out of 100,000 varieties of rice, they selected the 3,024 with distinctive characteristics believed to be tied to the crop's future. Researchers in China then unraveled the genome of each variety, a monumental effort that captured 30 million variations in the varieties' nucleotides.
That amounts to 120 TBs of data, now stored for free on AWS Simple Storage Service and available to researchers around the world. "It's a huge amount of data. It's not like you're going to download it. You can't just email it around," said Leach.
But with the data available, research that's been extremely difficult to do in the past may become possible. By comparing the genomes looking for common characteristics among the different varieties, researchers can narrow the gene sequences they need to examine to pursue certain crop qualities.
"If we know this gene functions to give disease resistance in this variety, we can ask, why isn't it giving resistance in these other varieties?" asked Leach. Instead of needing to examine 1,500 nucleotides in each variety, researchers might be able to narrow the field to 5, based on the 3,024 reference genomes.
By examining the context and surrounding nucleotide sequencing, researchers may find a way to turn on resistance in additional varieties. Likewise, they can examine "snips," or short gene sequences, for such characteristics as heat resistance, drought resistance, taste, and nutritional value. Much of the world's rice crop is grown underwater or on heavily irrigated land. India, which produces 20% of the world's rice, had been predicted during 2008-2009 to be suffering a decline of 10 million tons of output, during a year of drought. If droughts become more common, rice that can survive dry conditions may become the mainstay for significant segments of the world's population.
Marco van den Berg, CIO of the International Rice Research Institute, said in an email interview that no other food crop has been subject to this level of genomic analysis before. "Public availability (of the results) speeds up research and stimulates the availability of additional sequences. The availability of the data opens venues to do research which we probably haven't even imagined yet," he wrote.
"AWS has made it easier to compare the different genomes," said Leach. An initial analysis of the 3,024 genomes has been conducted using the DNAnexus analysis platform on Amazon to study and compare the varieties. Amazon used 37,000 CPUs, over two days, to conduct the initial analysis. That data made available to researchers worldwide has Leach, who has worked in plant research for 30 years, excited.
In addition, Amazon has made open source analytical tools, such as the command line SamTools and user interfaces Iobio and Galaxy, available as part of the data set, she said.
"You don't need to be a high-powered compute center to inspect the data," she said. Introductory use of the data is free. At some level, Amazon will charge users (like other cloud customers) for the use of compute, memory, and CPUs. Researchers at cost-conscious institutions will probably learn how to use Amazon Spot Instances, which are available on idle processors to low bidders during periods of low demand, such as the middle of the night.
In fact, the placement of the data, the availability of both the sophisticated DNAnexus platform, and open source tools amount to a giant experiment being conducted for the first time in the cloud.
No other food crop has had so many sequences published together, and no one is sure what to expect in the way of results.
But Leach and van den Berg know the depth of interest among agronomists and crop researchers. Leach said her co-researchers in grain crops such as wheat, oats, barley, and maize will be following the outcomes as a gauge of what could be done in their own fields.
"Having this genome information is a very important step … We will help other fields by making this data available to everybody," she said.