Commentary

George Crump
 

Data Keepage

Your servers are probably bloated with data that is years old and yet despite your retention policy, if you have one, you keep it all. The relatively inexpensive price of disk capacity has made it easier to keep everything on primary disk storage. When you think of primary storage, you think of active data, databases, current documents, e-mail, etc. -- but because of the affordability of storage, it basically also has become the archive. Data is kept on disk, "just in case." It seems easier to simply add more disk space to primary storage than to force users to manage it; as a result, "Data Keepage" begins.

Your servers are probably bloated with data that is years old and yet despite your retention policy, if you have one, you keep it all. The relatively inexpensive price of disk capacity has made it easier to keep everything on primary disk storage. When you think of primary storage, you think of active data, databases, current documents, e-mail, etc. -- but because of the affordability of storage, it basically also has become the archive. Data is kept on disk, "just in case." It seems easier to simply add more disk space to primary storage than to force users to manage it; as a result, "Data Keepage" begins.The effects of data keepage are widespread, but two of those effects require immediate concern.

Impact On Backup Data keepage's largest impact is on the backup process. Most data centers will continue to run weekly full backups. These backups will be bogged down, backing up millions of files that haven't been touched in years, let alone since the last backup. Hardware acquisition costs are driven up because the size of the backup target, be it disk or tape, needs to be larger. Backup data deduplication systems can help the storage capacity costs, but they do little to thin the backup across the network. Additionally, most backup applications have difficulty handling millions of small files because they have to walk these file systems. Block-level incremental backup (BLIB) applications, like those from Network Appliance and Syncsort, thin the backup across the network by only sending changed blocks. They also are immune to the millions of small-file issues because they do their changed state analysis at a much lower level that is more efficient than at the file-system level.


More Storage Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

Another solution that can be deployed in conjunction with BLIB or independently is disk-based archiving: getting the data off primary storage to a less expensive but more secure device. Cleaning off primary storage has been less then desirable in the past but with features like data deduplication, easy access via a network mount point, massive scalability, and power management, these solutions make the process viable and the effort worthwhile.

Violation Of Retention Policies If you have a strict retention policy, whether for e-mail, files, or other data types, data keepage probably puts you in violation of that policy. For example, if you have a retention policy that defines retention of a year or three years, which is not uncommon, this typically means that you are only going to maintain backups for that period of time. It essentially is a restore policy to protect you from legal action. The problem is that if it can be proven that the data being sought after already is on a server somewhere, you will be forced to deliver that information. Saying you don't have backups of that data doesn't apply unless that data has been deleted. Obviously, deleting data after notification of legal action is directly against the law.

The solution here is disk-based archive, with added functionality to make it enterprise class. Capabilities such as Write Once Read Many (WORM, used to prevent changes to data), encryption, and content indexing all become critical in managing retained data.

These are just two of the critical problems that giving into the temptation of data keepage cause. Solutions exist to clear off this old data and optimize the investment in primary storage for only the most active set of data, resulting in better use of storage expenditures, improved backup windows, and better litigation preparedness.

George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.


Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
T-Shirt Giveaway T-Shirt Giveaway: Each week we're selecting one great comment from our readers. The author of the comment will receive an InformaitonWeek Community t-shirt. So get posting!
Subscribe to RSS

Resource Links