In our past two entries in this series we have discussed how Cloud Backup and Primary Cloud storage will change the role of the storage administrator. In this entry we will cover what is potentially one of the most popular initial uses of cloud storage--its use as an archive—and how that affects the storage manager's role and what the main concerns should be.
If you work in IT for a business of almost any size, you have to retain data to meet either a legal requirement or a business need. As we have discussed in the past, an archive is not a backup and they should be maintained separately. This creates a need to develop an archive storage infrastructure. One of the options to consider is the cloud as an archive area. The value of archiving data to a cloud storage provider is you don't have the physical aspects of a separate storage area to manage. You don't have to power or cool it and you don't have to make sure there is enough capacity on it. All of this is handled by the provider once you get the archive out of your building.
The storage administrator still needs to identify which data needs to be moved to the cloud archive. This can be done with manual inspection or by using tools that can help you determine which files or folders have not been accessed for a period of time. There has been issues with these software applications in both the time it takes an agent-less architecture to scan the environment and the resources an agent-based architecture may consume. But as we discuss in our recent article, there are ways to scale agent architectures more effectively than in the past and agent-based architectures are increasingly more reliable. The important component is to have a software application that can deliver the results quickly so you can make data movement decisions.
With an ability to selectively identify candidate data for archive and to leverage a tool that can scale to meet the never-ending requirements of data growth, the next step is to select the cloud architecture. Once again a hybrid architecture makes sense here, something that can cache the data locally on premises and then copy to the cloud storage provider as needed. A hybrid approach may be less critical in an archive situation than it is in a primary storage or even backup situation since copy jobs can easily be queued and recovery speed is not quite as important. A cloud archive appliance does make things easier since data can be copied to it as if it were a NAS target. A hybrid approach may also help with cloud migration issues if that need comes up as well.
In dealing with the cloud storage provider, the administrator has to make sure that the provider has the capabilities to retain information for as long as the company is legally required to keep the information and whether the provider has the ability to transmit and store that data in an encrypted manor. They also have to make sure that the provider can retain that information in the specific way required. For example, if data needs to be in a non-modifiable format you need to make sure that this can be accomplished. Or if data needs to be deleted after a certain number of years, you need to make sure that the provider supports this.
The deletion of data from the cloud may be the most overlooked consideration. Most cloud providers make at least two and in some cases three copies of your data to make sure that it can be delivered back to you in case of a disaster. When you issue a delete order, does that data really get deleted or is it left spread out all over the cloud? The simplest way to make sure that data is deleted is to be able to destroy the encryption keys for those files, but you have to make sure that your provider gives you that level of granularity.
The storage administrator's job is largely unchanged in the cloud archive world. They have less physical assets to be responsible for in their data center, which could save time, but the process part of archive is still there. Data needs to be mined, classified, and then stored in a way that it can be retained and/or deleted against various schedules. In fact, data management skills are more critical than ever in the pay-by-the-GB-per-month world of cloud storage.
Follow Storage Switzerland on Twitter