4 min read

Savings Through Storage Management

You can cut the amount of energy used in storage -- but how depends on the type of system.
Considerable energy savings can be found in storage systems, but the means to achieve those savings vary depending on the use of the system. For our purposes, we'll classify them as online transaction processing; file storage; and data warehousing, archiving, and backup.

OLTP systems might seem to be the last place you'd look to save a few bucks on power. In OLTP, time is money, so the faster the storage system, the better--period. We wouldn't propose to change that thinking; rather, it's the process of architecting storage use that needs changing.

For most systems, database administrators carefully figure their expected usage and then double it a couple of times for safety. The last thing anyone wants is a disk-full error on an OLTP system. The result is a lot of unused space, but determining just how much is all but impossible.

Storage vendor 3Par maintains that in one instance, it found a large customer's existing OLTP systems to be only 8% utilized--the other 92% was just adding entropy to the universe. While this is an extreme example, it illustrates the point. Good storage management can save a lot of money, space, and power.

For OLTP that means thin provisioning, a technique that allocates disk blocks to a given application only when the blocks are actually written rather than at initial provisioning. By employing thin provisioning along with good storage management software, the onus for guessing exactly how much storage will be needed is taken off the database administrator and instead can be tracked by the storage management system itself. A year ago, only a few vendors were offering thin provisioning on OLTP systems; now it's much more pervasive and should be strongly considered to right-size OLTP storage.

For general-purpose file storage, the problem is much different. Thin provisioning is useful here, too, but more commonly the problem comes down to creating a global view of all data stored across disparate systems. After all, just how many copies of the latest Paris Hilton video does an enterprise need (or copies of the annual report, or the video of the CEO's address to shareholders, and so on)? Global storage resource management (SRM) systems are the best bet for understanding and managing the complex storage needs of an enterprise.

So just how much file storage space is typically wasted? According to Sun Microsystems, only about 30% of existing space is used well. The rest contains redundant or seldom-accessed data or is unallocated. SRM systems track down these wasted files, letting you do something rarely done in the enterprise: delete them.

For archiving and backup systems, the problem is almost always stored redundant data. Here, data deduplication is the technology to use. The technology creates hashes of each data block (or in some implementations, of each file) it sees. If the calculated hash matches that of another block already stored, the system notes the duplication and doesn't store the new block.

Vendors such as Data Domain, Diligent Technologies, FalconStor, Quantum, and Sepaton offer data deduplication, usually with virtual tape library systems. Deduplication can reduce storage requirements to one-twentieth current levels. Turning off 95% of your spinning archives is a pretty darn green thing to do.

Data deduplication is making its way into primary file storage, too. In May, NetApp announced A-SIS (advanced single-instance storage). The software can be added to any of the vendor's FAS filers; however, it's currently limited to file systems and can't be used on block LUNs. Still, it's a good start and good direction for the storage industry; we expect that data deduplication technology will continue to find its way into primary storage systems.

One last point: Data deduplication will be very useful for those who depend heavily on virtualization. Each virtual instance redeploys the operating system files repeatedly. Data deduplication not only greatly reduces the storage space required by virtualization, but it also increases the likelihood that when data is needed, it's already in filer's cache. Both go a long way toward improving the utility and performance of virtualization.

Return to the story:
The Cold Green Facts