New Tools For Finding Data And Documents Quickly - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Enterprise Applications
10:25 AM

New Tools For Finding Data And Documents Quickly

Content-addressed storage technology can help businesses preserve documents and find them easily

There's been a lot of buzz in legal circles recently about United States v. KPMG. The feds accused the accounting firm of cooking up illegal tax shelters for rich clients from 1996 to 2003. What caught our eye isn't the $456 million the firm will pay or even the $2.5 billion in evaded taxes. We noticed that the case thus far has generated, in electronic or paper form, 5 million to 6 million pages of discoverable documents, of all shapes, sizes, and types. That's a prime example of why data-retention and digital-discovery requirements have lit a fire under the normally staid archival market.

Vendors are touting content-addressed storage, or CAS, as a way to make discovery requests more manageable. In a nutshell, a CAS system locates data by an array-assigned address, rather than by physical address or directory. Since the CAS device completely abstracts data from the hardware on which it resides, documents can be found based on content, rather than by storage location.

The earliest entry into this market, EMC's Centera, first released in 2002, is still the clear leader in terms of CAS-capable units, mainly because EMC was first with a strong play. Today, competitors big and small, including Caringo, Hewlett-Packard, Hitachi, IBM, Nexsan, and Sun Microsystems, are bullish on CAS. We expect every major storage vendor to provide some iteration of CAS, albeit under the guise of a "complete archive management system." Some have entries already, and we expect others to follow suit in the next 24 months.

Digital Fingerprints

A CAS system comprises storage nodes, where data is physically kept, and access nodes, where metadata and information on the data's location on the storage nodes are kept. CAS can cut down on duplication, and thus storage space requirements. A document with even a small change will be saved separately from the original copy, providing digital fingerprinting and versioned storage. Some vendors use this capability to keep only one copy of a given data set, removing the duplicates usually found on standard location-addressed storage.

The story isn't all positive: Many CAS devices have significant shortcomings. For example, metadata standardization is nonexistent. The Storage Networking Industry Association is creating a standard that will allow for the migration of XML-based metadata between different CAS systems, but those efforts are incomplete. Keep an eye on SNIA and ask your vendors about plans to implement eventual CAS standards.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Remote Work Tops SF, NYC for Most High-Paying Job Openings
Jessica Davis, Senior Editor, Enterprise Apps,  7/20/2021
Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Flash Poll