Big Data Storage Cost Cutter: Cassette TapesIBM adapting '70s-era technology for world's largest radio telescope, corporate big data operations.
A hot new technology is on its way to the rescue of storage-deprived big-data managers, as soon as it can finish working its way out of 1970s-era dashboard cassette decks and Sony Walkmans.
- The Critical Importance of High Performance Data Integration for Big Data Analytics
- Applying analytics to optimize marketing effectiveness
- Big Data and Smart Trading
- Big Data Analytics: Profiling the Use of Analytical Platforms in User Organizations
The tape cassettes, being produced as prototypes for the world's largest radio-telescope installation, are able to store as much as 35 terabytes on a single cartridge, according to a report in New Scientist this week.
The cassettes actually are an adaptation of technology developed in 2010 as an advance on IBM's Linear Tape-Open (LTO) magnetic storage tape--the base medium for IBM's System Storage Tape Library storage systems. Latest-generation versions of that system are able to store only 4 TB per cartridge, 12TB at the highest level of data compression.
The secret is in the tape itself and the coating designed to protect it, produced by Fuji Film as part of a joint development project with IBM in 2010. The high-capacity tape is coated with nanoparticles of barium ferrite, which stabilizes magnetic storage media by keeping moisture and oxidation (rust) from damaging the surface of storage tape.
The two companies first announced the tape in 2010, along with the claim it could store 44 times as much data per square inch as standard third-generation LTO magnetic-storage tapes. IBM is adapting the tape and cartridges for the Square Kilometre Array (SKA), which will be the world's largest radio telescope when it goes online in 2024. SKA directors expect the array will collect enough cosmic-radiation data to equal a full petabyte per day after the data is compressed. Uncompressed, the data could take up as much as 10 exabytes of storage per day.
[ Read 4 Steps For Secure Tape Backups. ]
Using the most commonly shipping, 3 TB fifth-generation LTO tapes, that would fill up 330 cartridges per day, 120,000 per year, according to New Scientist. By the time SKA actually needs the tape, IBM will have raised the capacity of each cartridge to 100 TB by narrowing the strips in which its systems lay the data down on tape and building super-accurate controls for the heads writing the data to tape.
The ultra-high capacity cartridges are being developed in research labs at IBM-Zurich, which is also working on less retro-sounding storage technologies, such as three-dimensional memory chips that stack components on top of one another to reduce the time it takes data to travel across them.
It is much easier--and cheaper--to adapt existing tape technology to radically increase its capacity than it is to invent something completely new, according to IBM Fellow Evangelos Eleftheriou, manager of the labs' storage development, who leads the SKA tape project. The biggest drawback to the tape-cassette approach is that tape is linear, forcing storage-control units to scan through the entire length of a tape, if necessary, to find specific bits of data, Eleftheriou said in a statement. It's possible to avoid that latency with systems that are aggressive about predicting what data will be needed and moving it from tape to disk or system memory before having to deliver the data.
Still, at the capacity range expected of SKA--or, to a lesser extent, corporate big-data analytical projects--ultra-high data density alone won't be enough to keep an organization from drowning in its own data. To keep SKA from having to store the 10 petabytes of data it could generate every second it operates, powerful analytics and processors will have to filter out junk data, eliminate data whose quality is good but extraneous to the project, and compress the result to the point that the tapes contain somewhere between half and one sixth the volume of data SKA takes in, Eleftheriou told ComputerWeekly.
It is unlikely even the largest, most enthusiastic big-data users will approach the kind of capacity SKA's IT staff will have to deal with every day. But it is likely that every organization that stores big data on the premises will be looking for low-cost, high-capacity alternatives, said Dan Woods, former CTO of TheStreet.com and CapitalThinking, and an analyst at CITOResearch.com. "A mature solution for big data storage will allow data to reside in tiers and gracefully migrate from one tier to the next and back again, as required," Woods wrote for Forbes.
Tape itself is so well known, so inexpensive for the capacity it delivers, and so stable for long-term storage that it is still an attractive option for some types of storage, according to Eleftheriou. But those interested in IBM's newest approach to cassette-enclosed high-capacity magnetic-tape storage will have to wait to try it. It will be a long time before it's a product for SKA, let alone the rest of the industry.
In-memory analytics offers subsecond response times and hundreds of thousands of transactions per second. Now falling costs put it in reach of more enterprises. Also in the Analytics Speed Demon special issue of InformationWeek: Louisiana State University hopes to align business and IT more closely through a master's program focused on analytics. (Free registration required.)