Backup tapes can kill an e-discovery effort with costs and complications. Fight back with smart policies and the right tools.
Many organizations are sitting on stockpiles of dangerous materials. No, we're not talking about hazardous chemicals or unstable explosives. We mean backup tapes, which are routinely included in requests to produce electronically stored information (ESI) as part of potential or ongoing litigation.
The e-discovery realm is rife with cautionary tales of organizations tripped up by backup tapes. For instance, in 2009 a judge fined a defendant more than $1 million for failing to retrieve information stored on backup tapes. In the same year, the government's Office of Federal Housing Enterprise Oversight was compelled by a court to search its off-site disaster recovery backups for ESI, a search that ended up costing the agency $6 million--a jaw-dropping amount for a single discovery exercise. What's even scarier is that the agency wasn't even a party in the lawsuit; it had simply been subpoenaed for documents in litigation involving Fannie Mae.
This article examines the challenges that backup tapes pose. It also discusses strategies organizations can use to reduce the number of tapes that get stockpiled, and it outlines technologies and services that help reduce the cost and time it takes to retrieve ESI from tape.
Backup tapes have been used for decades to store data. Today, Linear-Tape Open (LTO) Ultrium and its earlier generations are the primary format. Backup tape cartridge characteristics, including large storage capacities (1.5 TB raw and 3 TB compressed per LTO-5 cartridge), robust transfer rates, and low power consumption make them an ideal backup medium, even with the emergence of disk as an alternative.
While tape may be suitable for backup and disaster recovery, it's rarely a good choice for archiving. Unfortunately, many companies have unintentionally adopted tape for that purpose, often because they lack proper media management policies, or they inherit large stockpiles of backup tapes from mergers and acquisitions.
Several factors make tape a particularly difficult medium to deal with for e-discovery. First, most organizations only have a vague idea of what might be on their backup tapes. To find out, the information on the tapes must be run through the backup application that made the tape, or even restored to the application that generated the data. Contents of backup tapes may be in proprietary formats requiring older--and typically difficult to locate--versions of backup and business applications.
An organization may have to set up an entire application environment, such as Microsoft Exchange, to restore data on backup tapes. This process can be costly and time-consuming. Recovery costs--the amount an organization has to spend to get usable information from backup tapes--are estimated to be in the range of $500 to $1,000 per tape.
Tale Of The Tape
Jeffery Fehrman, VP of forensics and consulting at Integreon, which provides e-discovery and legal services to corporations, recounts that on one job, his team restored 4 TB of e-mail from almost 500 backup tapes. Those 4 TB then had to be indexed, deduplicated, and searched for relevant ESI. The bill for the entire job, which took nearly two years: $8 million.
Discovery of tape can be complicated by other factors. Many tape formats are now obsolete, making compatible drives difficult to obtain. Another problem is the high level of duplication on many tapes, forcing organizations to sift through large piles of irrelevant data to find relevant ESI.
Organizations should take a two-pronged approach to addressing e-discovery on backup tapes: One is based on company policies and practices around retention and disposition; the other is technological.
On the policy front, organizations must understand that there are significant differences between backing up and archiving information.
Backup systems, including tape, should be used only for disaster recovery and business continuity. That means backups should be kept only long enough to enable specific recovery point objectives--that is, how many hours' or days' worth of data a company wants to be able to restore. Some companies may require as little as 14 days of retention, while others may need six months or longer.
By contrast, an archive is designed for the retention of critical information--business records, contracts, e-mail--that companies may need to preserve for significantly longer periods. Archives tend to live on disk and often have built-in features such as classification and search that make it easy to identify and produce information that may be relevant to discovery or other investigations. Archives also provide IT with the tools to retain information long enough to meet rules and regulations and then dispose of that information once it reaches the end of its retention period.
The second prong to addressing e-discovery on backup tapes is technological. Tools are available that are designed specifically to help organizations get ESI from backup tapes. Products and services from companies such as Index Engines, RenewData, and eMag Solutions let companies scan, catalog, and restore data from tapes without many of the burdens of traditional methods of restoration, such as re-creating the application environment. There are also products for specialized environments, such as Bus-Tech's VTL line, which is for large financial institutions and other organizations that run mainframes and operate in highly regulated markets.
Index Engines sells appliances that can access a variety of tape formats without requiring access to the legacy backup application or even a recovery environment for databases and e-mail servers. The appliance indexes data on the tape, making all the data searchable. When relevant information is found, the appliance can also copy it to another medium. "It can get a single message and pull out that message without us having to build an Exchange environment," says Integreon's Fehrman, who uses Index Engines and other tools. A single 6-TB Index Engines appliance can scan and index about 50 to 70 TB of tape data. But at a cost of about $150,000, the appliance isn't cheap.
RenewData's eDiscovery Acceleration Platform is provided as a service, so customers don't have to make any supporting capital investments. Customers can use it to restore one tape or thousands. RenewData takes precautions to avoid inadvertent data spoliation--that is, destroying or altering data so as to make it inadmissible in court. The vendor can also testify on behalf of clients as to the defensibility of its data.
Technologies and services such as these make data restoration from backup tapes less painful and costly than traditional methods. However, these improvements also make it harder for parties to a lawsuit to argue that they shouldn't have to produce backup tapes. That's why it's essential for organizations to have retention and disposition policies on the books--and the mechanisms in place to enforce those policies.
Behzad Behtash is an independent IT consultant who previously served as CIO of Tetra Tech EM and VP of systems for AIG Financial Products. Write to us at firstname.lastname@example.org.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.