The new software, called InfoScape, uses classification technology repurposed from EMC Documentum software to crawl file systems and automatically tag files with basic metadata (such as date created, author, headline) and with certain clues to business value, sensitivity and content. For instance, InfoScape might flag files that contain the data format of a Social Security number (XXX-XX-XXXX) as well as the words "confidential” or "contract" in the subject line. InfoScape also ships with 25 industry taxonomies (hierarchies of categories under which files can be grouped). Then it's up to customers to create policies and rules for what should be done with each file and take action--for instance, encrypt the file containing personally identifiable data, restrict access to the confidential document, or archive the contract in a secure, top-tier storage array. Alongside InfoScape, EMC has also introduced Information Management Strategy Service for those who want consultants to help them set and enact these policies.
This ability to automatically classify files no matter what application they were created in or where they're stored is not brand new; Kazeon, StoredIQ, Scentric, njini and Index Engines all offer some form of horizontal automated classification software or appliance. But InfoScape is the first offering from a company with the history, size and clout of EMC. The need for such technology will grow increasingly clear, says Arun Taneja of the Taneja Group, a technology consulting firm. "Enterprises will need to apply one or more information classification engines that will work with all unstructured data and give you genuine, true ILM capability."
If automated classification becomes a hot trend, there will also be an increasing need to do something with the organized files. So far, the only file migration InfoScape offers is within EMC's Celerra storage boxes, although the company plans to support more EMC storage products over the next 12 months. In fact, a cynic might look at InfoScape as a way of locking storage customers into the EMC fold. But it could help customers with three important tasks: protecting sensitive files, such as those containing personally identifiable information, company secrets or intellectual property; moving less important data to inexpensive storage; and possibly accommodating a legal discovery. (InfoScape should be able to produce all files created by a particular author within a set timeframe. However, more complicated queries would be tougher because InfoScape doesn't support universal full-text indexing.)
Over the next year, EMC will provide InfoScape with APIs to integrate with other storage software products and provide new features, such as automatic encryption of confidential files, bringing it closer to full-fledged ILM.
Pricing starts at $125,000.