August 20, 2012
Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends (click image for larger view and for slideshow)
IT managers wondering how they'll ever fit enough hardware into overcrowded data centers to store all the video, social networking content, Web server logs, and other big data elements got a bit of good news last week: Storage media may be available to data centers in a few years that relies on technology millions of years old to store an unprecedented volume of data in an astonishingly small space.
In a paper published in the Aug. 16 issue of Science, a team of Harvard geneticists demonstrated a technique to store the entire 5.27 MB of content from a genetics text book in less than a thousandth of a gram of DNA layered on microchips instead of the usual metallic circuitry. Storing data in DNA is standard laboratory stuff at this point; this example stored and retrieved data more than 600 times the volume of the largest attempt to date, according to the Wall Street Journal.>
Storing data on strands of DNA using microchips is chancy right now, but has the potential to be able to store every byte of digital data in the world in just a handful of chips. A DNA-storage module the size of an ordinary flash drive could store all the data currently available on the Internet, according to the researchers. Retrieval is tricky, requiring DNA sequencers rather than simple magnetic sensors to decode the data. It is also sequential, so finding the right data means scrolling through the whole pool of data as if using a tape drive, according to bioengineer Sriram Kosuri, who was lead researcher on the data-storage portion of the broad-based DNA research project.
Even though it is currently slow and difficult to achieve, the density of DNA as a storage medium is "off the charts" according to a Science editorial accompanying the results.
[ Learn more about the key big data challenges. Read Big Data Development Challenges: Talent, Cost, Time. ]
DNA microchips, more accurately called DNA microarrays, are manufactured almost like regular microchips and contain no genuine DNA, which tends to mutate or die, damaging or destroying much of the data they contain. Instead they use thousands of strands of synthetic DNA stretched across a tiny glass plate that is encased in plastic.
The Harvard team's version ended up with 55,000 strands of DNA--less than a thousandth of a gram-chemically synthesized using an inkjet printer that laid them onto the chip. The data to be stored was first converted from a digital file into tiny blocks made up of the four-letter chemical alphabet of DNA, then a DNA sequencer was used to put the strands in the right order, encode the data, and add a tiny indicator on each strand of where in the book that data appears and how to find the other pieces on the microarray, according to Science.
DNA microarrays aren't uncommon, but are used most often as a diagnostic tool to identify specific genetic mutations by watching to see what errors come out after doctors insert a patient's problematic DNA into the microarray and wait to see what happens, according to a fact sheet from the NIH's National Human Genome Institute.
Encoding data requires DNA sequencers, but allows data to be stored using four possible values--A, G, C, and T, for adenine, guanine, cytosine, and thymine. Magnetic data storage media, like all current digital technology, is limited to just two characters--0 and 1.
Hard drives storing data in clusters of magnetized grains on a spinning platter are able to store about 25 gigabits per square inch, according to a July paper on high-density storage in the journal Solid State Technology. For easier comparison, that's between five and six gigabits per cubic millimeter.
The DNA microchips produced by the Harvard team, on the other hand, can hold as much as 5.5 petabits, or 5.5 million Gbits per cubic millimeter.
12 Top Big Data Analytics Players
12 Top Big Data Analytics Players (click image for larger view and for slideshow)
Hard drive densities are also approaching their theoretical limit for data density just as big data, video, and audio content, social networking content, machine-to-machine data such as server logs, and the vast catalog of virtual machines, virtual applications, and other files that make up cloud or virtualized IT infrastructures are causing corporate data storage volumes to explode.
Big data is a major culprit in the out-of-control growth of demand for data storage online and in major corporations, according to analyst and vendor reports. End-user companies using scale-out network-attached storage systems told Aberdeen group their storage requirements are growing at 52% per year--which would double their on-site hardware requirement every year and a half, according to a March study from Aberdeen Group.
A separate study, published this month by networking giant Cisco Systems predicted the volume of data flowing over IP networks--all of which has to be stored somewhere--will increase four-fold by 2016, after increasing eight-fold during the past five years. Data from mobile devices will grow three times as fast as IP data, increasing 18-fold between 2011 and 2016; by 2016 the number of mobile devices connected to global networks will total three times the number of people available to use them, Cisco's survey predicted.
Though the technology that may be developed from the finding could yield dramatic results, data storage wasn't the main goal of Harvard geneticist George M. Church and the team of researchers from Johns Hopkins University and the Wyss Institute for Biologically Inspired Engineering who did the work and wrote the report.
Church's goal, according to Science, is to completely reinvent the human genetic code using synthetic DNA that could be used to replace or correct DNA that produces congenital diseases, curing patients using the body's own programming codes and control mechanisms.
As with most scientific breakthroughs, actual products with practical versions of the new technology may lag years behind publication of the initial results.
Church's and Kosuri's demonstration shows the medium can be used for high-volume storage, though recording the data took several days and reading it back took even longer.
The cost is dropping fast, too. In 2001 the cost of generating a million base pairs of DNA that could be used for data storage was about $10,000, according to the WSJ. Today that cost is about 10 cents per million.
If DNA microchips do turn out to be practical for the storage of data more workaday than instructions on whether a child's eyes should be blue or brown, however, the result could have a dramatic impact on the cost, physical layout, and amount of storage hardware required in large corporate data centers.
DNA microchips would use a fraction the power of either a flash drive or hard drive and pack many times the volume of data they can contain into a much smaller space.
They also use a fraction the amount of power required for large-scale magnetic storage of big data, according to Church's conclusions.
Together, those two functions could tame demand for power, space and hardware in enterprise data centers, the bulk of which comes from the need for more centralized storage to accommodate nearly every major trend in IT right now.
About the Author(s)
You May Also Like