Data Dedupe: Doing More -- A Lot More! -- With Less

You're already familiar with file compression technology. Now meet its big brother -- data deduplication -- and learn how it can save your company a ton of money.
You're already familiar with file compression technology. Now meet its big brother -- data deduplication -- and learn how it can save your company a ton of money.Businesses with growing data-storage requirements must also deal with the challenges of backing up this data. In the past, this meant either investing in additional backup capacity, cutting backup data-retention times, or simply choosing not to back up certain data sources.

The first choice is expensive, even as disk-based storage costs continue to fall. The second choice can quickly poke holes in your company's backup strategy.

And the third choice? Think of it as playing Russian Roulette with one of your company's most important assets -- its data.

Deduplication technology has been around for a number of years. Recently, however, it has exploded into the IT mainstream; whether a vendor provides enterprise-class backup solutions or caters to the smallest businesses, "dedupe" probably figures prominently in its backup products.

Don't Miss: NEW! Storage How-To Center

Yet there is clearly a disconnect here. According to storage expert George Crump, most IT professionals still don't pay much attention to dedupe technology. Yet at the same time, some of the world's biggest storage vendors are paying megabucks to acquire the latest and greatest dedupe innovations.

The basic idea behind deduplication is simple. Think of it as a backup solution that is intelligent enough to know when it encounters the same data twice. An obvious example would be an email archive backup that includes lots of attachments. If a backup system recognizes that a number of messages contain the same attachment, it can keep a single copy and replace the others with a virtual pointer.

It's a far more powerful approach than traditional data compression. Consider another example: a backup archive full of JPEG images. Those JPEGs are already compressed; a data compression tool will have little or no impact on them.

A dedupe solution, however, can pick through the same content -- document archives, Web site content repositories, or other sources -- and drastically reduce the space required to back it up.

The gory details here can get very complicated, very quickly. Different dedupe solutions operate at different points in a company's IT infrastructure and apply different techniques to get the job done. All of them, however, are capable of cutting the space required for backups by up to 90 percent in some cases.

Dedupe technology truly allows you to do more with less. Many companies can actually cut their investments in backup storage hardware while still increasing their data-retention periods or cutting the amount of time between backups. And while it is always a smart idea to prioritize business data as part of a backup strategy, dedupe means never having to skimp on backups that are necessary to protect essential data.

Finally, dedupe offers another huge benefit: It makes cloud-based solutions practical for even very large backup jobs. In the past, even companies with relatively fast Internet connections could take days to upload a multi-gigabyte backup to a cloud provider. And for smaller firms that rely on DSL connections, the same jobs to could take weeks -- an obvious deal-killer for comprehensive cloud-based backups.

That's why many cloud-based backup services are touting dedupe technology. In many cases, it can turn a multi-day backup into much faster process, especially once a company has its first full online backup in place and can start uploading smaller, incremental backups.

And, of course, smaller uploads also mean smaller per-GB storage charges.

Where should you go to get a quick education on dedupe technology? Here are a few suggestions (note that I note vendor-specific links here because they offer good background information, not because I endorse their products or services.)

- Start with this InformationWeek article that explains dedupe clearly and lays out the technology's pros and cons. - Storage solution provider and IBM partner EMC also has a great FAQ that covers both the basics and some more advanced aspects of the topic for IT professionals. An EMC-authored white paper on the subject is also good enough for me to recommend it as essential reading. - Another vendor, Quantum, offers an overview of the technology that focuses on its economic and operational benefits. - Finally, a article offers both a solid introduction to dedupe technology along with a wealth of pointers to other articles dealing with the topic and with how storage solution providers are implementing it.

Editor's Choice
Brandon Taylor, Digital Editorial Program Manager
Jessica Davis, Senior Editor
Terry White, Associate Chief Analyst, Omdia
Richard Pallardy, Freelance Writer
Cynthia Harvey, Freelance Journalist, InformationWeek
Pam Baker, Contributing Writer