IT managers wondering how they'll ever fit enough hardware into overcrowded data centers to store all the video, social networking content, Web server logs, and other big data elements got a bit of good news last week: Storage media may be available to data centers in a few years that relies on technology millions of years old to store an unprecedented volume of data in an astonishingly small space.
In a paper published in the Aug. 16 issue of Science, a team of Harvard geneticists demonstrated a technique to store the entire 5.27 MB of content from a genetics text book in less than a thousandth of a gram of DNA layered on microchips instead of the usual metallic circuitry. Storing data in DNA is standard laboratory stuff at this point; this example stored and retrieved data more than 600 times the volume of the largest attempt to date, according to the Wall Street Journal.>
Storing data on strands of DNA using microchips is chancy right now, but has the potential to be able to store every byte of digital data in the world in just a handful of chips. A DNA-storage module the size of an ordinary flash drive could store all the data currently available on the Internet, according to the researchers. Retrieval is tricky, requiring DNA sequencers rather than simple magnetic sensors to decode the data. It is also sequential, so finding the right data means scrolling through the whole pool of data as if using a tape drive, according to bioengineer Sriram Kosuri, who was lead researcher on the data-storage portion of the broad-based DNA research project.
Even though it is currently slow and difficult to achieve, the density of DNA as a storage medium is "off the charts" according to a Science editorial accompanying the results.
[ Learn more about the key big data challenges. Read Big Data Development Challenges: Talent, Cost, Time. ]
DNA microchips, more accurately called DNA microarrays, are manufactured almost like regular microchips and contain no genuine DNA, which tends to mutate or die, damaging or destroying much of the data they contain. Instead they use thousands of strands of synthetic DNA stretched across a tiny glass plate that is encased in plastic.
The Harvard team's version ended up with 55,000 strands of DNA--less than a thousandth of a gram-chemically synthesized using an inkjet printer that laid them onto the chip. The data to be stored was first converted from a digital file into tiny blocks made up of the four-letter chemical alphabet of DNA, then a DNA sequencer was used to put the strands in the right order, encode the data, and add a tiny indicator on each strand of where in the book that data appears and how to find the other pieces on the microarray, according to Science.
DNA microarrays aren't uncommon, but are used most often as a diagnostic tool to identify specific genetic mutations by watching to see what errors come out after doctors insert a patient's problematic DNA into the microarray and wait to see what happens, according to a fact sheet from the NIH's National Human Genome Institute.
Encoding data requires DNA sequencers, but allows data to be stored using four possible values--A, G, C, and T, for adenine, guanine, cytosine, and thymine. Magnetic data storage media, like all current digital technology, is limited to just two characters--0 and 1.
Hard drives storing data in clusters of magnetized grains on a spinning platter are able to store about 25 gigabits per square inch, according to a July paper on high-density storage in the journal Solid State Technology. For easier comparison, that's between five and six gigabits per cubic millimeter.
The DNA microchips produced by the Harvard team, on the other hand, can hold as much as 5.5 petabits, or 5.5 million Gbits per cubic millimeter.