Commentary

Plot An Effective Data Archive Strategy

George Crump

Good archiving systems should do three things: Save money, let you find data fairly quickly and last a long, long time.

7 Cheap Cloud Storage Options
7 Cheap Cloud Storage Options
(click image for larger view and for slideshow)
An effective archiving or data retention solution should do three things. It should let you store less-frequently-accessed data at a lower cost than if you kept it on primary storage. It should allow you find and access data relatively quickly -- but it does not have to be instantaneous like it is on primary storage. Finally, stored data should be durable. Data you archive today should be readable 10 years from now and beyond.

We used to say that all data has a decaying value; the further away from its creation date it gets, the less valuable that data becomes. Compliance and regulatory requirements as well as big data analytics and archive have changed that. We now have to assume that all data will become valuable again -- we just don't know which data or when. If decades from now your grandchildren check into a hospital, the doctors might want to access your medical records. They need them quickly and they better be readable.


More Storage Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

In theory, these archiving needs strengthen the position of many disk-based object-storage vendors. Their systems can provide data durability as well as quick access and cost effectiveness when compared to primary storage. The problem is that object storage is not as inexpensive as tape storage nor is it as power efficient.

[ Learn more about archiving schemes. Read Find The Right Data Archive Method. ]

Because we are talking about potentially storing all data for decades, we need to do everything we can, without putting data at risk, to reduce the overall storage cost of the system. After all, those records won't do you any good if the hospital can't afford to keep the system that stores them powered on and up-to-date.

However, before we turn over all archive data to the object storage vendors, there is a part of that "all data has a decaying value" theory that is still applicable. It's this: All data has a decaying speed at which it needs to be accessed. Using our medical example above, the doctors might need to access your medical records 50 years from now, but they probably don't need to have them in seconds. They can probably wait a minute or two.

As I noted in my article "Comparing LTO-6 to Scale-Out Storage for Long-Term Retention," in these situations tape is an ideal storage type. Data on tape can still be automatically scanned for durability and it certainly meets the cost-effectiveness requirements. What surprises most people that are either new to tape or have forgotten about it is how quickly a modern tape library can deliver data. In most cases access takes less than a minute; in the worst case it is two to three minutes.

Understanding The Data Access Decay Rate

The speed at which you need to have data returned to primary storage will depend on the needs of the business. Because the predictable response to, "How long can you wait?" is, "I need it now," it is important to make sure that business line managers understand the value of waiting. If they understand that waiting two minutes could save the organization $2 million a year in storage expenses, waiting sounds much more attractive. In almost every case the durability of the data is far more important than the speed at which it can be recovered.

I typically suggest a blended strategy: As little primary storage as possible, a reasonable amount of object/archive storage, and a hefty amount of tape. The amount of object/archive disk storage will be driven by your data access decay rate. For many organizations that might mean keeping all data on object storage for three to five years. For almost all organizations, longer-term retention should be on tape. This blended strategy gives the right balance between access, affordability and durability.

Our four business scenarios show how to improve disaster recovery, boost disk utilization and speed performance. Also in the new, all-digital Storage Virtualization Gets Real issue of InformationWeek SMB: While Intel remains the biggest manufacturer of chips in the world, the next few years will prove vexing for the company. (Free registration required.)

Related Reading


Informationweek Discussions

Start the Discussion


InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
Subscribe to RSS

Resource Links