Cloud storage may end up being the great storage repository in the sky. The destination that holds all our data and gets it off of our local storage. Whether you use this as a fourth tier of storage that your internal archive spills over too or as your sole archive, someday you are going to need to find data in it. Should we be indexing cloud storage to find the needle in the haystack?As we discuss in our article "The Importance of a Cloud Storage API", some cloud storage providers can solve the indexing cloud storage problem up front by having API sets that allow for tagging of information as it is moved into the cloud. This meta-data allows you to set information about the information so that when it comes time to retrieve that data you can provide keywords to help you find it. The challenge with an API set is that it needs an Independent Software Developer to integrate it into their solution. These vendors have either brought out Independent Software Vendor (ISV) programs to help popularize this concept or are working on it.
The value of tight integration with ISV's is that you can archive and fill in the keywords at the moment of archive. Check a box, fill in the keywords and click archive. This is when the keyword information is probably going to be the most accurate because it is fresh in the mind of the person doing the archive. In fact the application itself may be able to supply the key word data. Recall, when needed, could then be done right through the application or via a stand alone interface that the cloud storage provider has. The value of the later being that the search should work across all applications that were archiving to the provider.
To ease adoption many suppliers have added a NAS Gateway or have used standard internet friendly protocols like WEBDAV or REST to allow access to the cloud based storage. Indexing of the NAS gateway approach should allow for indexing solutions from Index Engines or Kazeon's (now EMC) and others to index this as if it were a normal file system. Testing of this type of solution should be done first to understand what the ramifications might be.
Indexing cloud storage should be an early consideration as you start to select a cloud storage platform. Successfully finding the needle in the haystack requires proper planning upfront. Waiting until you have a few TBs of information first is going to be problematic.
Track us on Twitter: http://twitter.com/storageswiss
Subscribe to our RSS feed.
George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.