Information lifecycle management has long eluded most companies. For one thing, no one application can manage all of a company's information throughout its lifetime, interacting with storage software along the way to make sure everything is where it should be at all times. For another, it's difficult for software to identify which files are highly important and must be archived, or which documents are of low value and can be discarded. EMC has filled in a piece of the puzzle that could start to make ILM a reality.
The new software, InfoScape, uses classification technology adapted from EMC Documentum software to crawl file systems and automatically tag files with basic metadata (such as creation date, author, headline and so on) and with clues to business value, sensitivity and content. For instance, InfoScape might flag files that contain data with the format of a Social Security number or the words "confidential" or "contract."
InfoScape ships with 25 industry taxonomies (hierarchies of categories under which files can be grouped). It's up to customers to create policies and rules for what should be done with each file--for instance, encrypt files containing personally identifiable data, restrict access to confidential documents or archive contracts in a secure, top-tier storage array. Alongside InfoScape, EMC has also introduced an Information Management Strategy Service for those who want EMC consultants to help them set and enact these policies.
This ability to automatically classify large volumes of files is not brand new; Abrevity, Index Engines, Kazeon Systems, njini, Scentric and StoredIQ all offer some form of automated classification software or appliance--in fact, Kazeon and Abrevity announced enhanced versions of their products shortly after EMC's announcement. But InfoScape is the first offering from a company with the size and clout of EMC. The need for automated classification technology is growing, says Arun Taneja of the Taneja Group, a technology consulting firm. "Enterprises need to apply one or more information classification engines that will work with all unstructured data and give you genuine ILM capability."
A core component of ILM is managing the movement and storage of files. So far, the only file migration InfoScape offers is within EMC's Celerra storage boxes, though the company plans to add support for more EMC storage products over the next 12 months. Critics might look at InfoScape as a way of locking storage customers into the EMC fold. But in truth, many EMC storage customers are locked in already and lack the human resources required to integrate incompatible products. For those customers, InfoScape adds a useful management layer to help protect sensitive files, store old or unimportant data in inexpensive storage or find documents in response to a legal discovery request.
The first incarnation of InfoScape may lack sophistication; one beta tester describes the autoclassification feature as "using a sledgehammer to drive a finishing nail." For instance, a rule that spots every file containing a Social Security number would also flag an internal e-mail message about a new key based on a Social Security number. "Is that a step forward? Yes, but it's still a sledgehammer," the customer says. EMC says it will improve and add to its autoclassification capabilities, but subtle rules and controls may take years to develop. --Penny Crosman