Digital Reef swings for the data management fences by indexing and classifying all unstructured data in the enterprise. Top applications include e-discovery and storage management.Digital Reef, which is having its official company launch today, has no shortage of ambition. The company aims to index and auto-classify all the unstructured data floating around on file servers, backup systems, archives, e-mail, collaboration tools, and content management systems.
The goal is to make unstructured data easier to manage when it comes to e-discovery, storage management, and compliance.
The company takes a comprehensive approach to indexing and classifying all the unstructured data in an enterprise. The company deploys software that crawls network storage systems and creates a full-content index of everything it finds, including metadata. It supports NFS and CIFS so it can mount most file stores. The company also has prebuilt connectors to get information stored in applications such as SharePoint, Exchange, and Lotus Notes.
The index is stored on a grid computing cluster. Customers can use commodity hardware for the grid. Content on the grid is stored in a flat-file format rather than in a database.
As the software indexes content, it also analyzes it with a similarity engine. This engine, which is the primary IP of the company, performs two major functions. First, it looks for duplicates or near-duplicates of files. By identifying duplicates, the index can return fewer files in an e-discovery exercise, saving time and money on document review.
The second function is auto-classification. The software looks at every piece of data in a file and suggests a classification based on the most relevant semantic ideas being expressed in the file. According to the company, the software doesn't need to be trained before it classifies and categorizes content. "We label all our folders with the top terms that placed a document into the folder, so you can understand why a document is in that folder," says Brian Giuffrida, VP of marketing and development at Digital Reef.
The index is fully searchable. While the company says general users can search the index, it's aimed more at legal and compliance managers that need to search large volumes of information for efforts such as e-discovery or to find sensitive data such as Social Security or credit card numbers that may need to be moved to a more secure location.
The company claims it can index and classify up to 4 TB worth of files every 24 hours. Management software routes jobs among servers in the grid to balance loads. If an indexing job on a particular target fails, the software can restart the job from the failure point instead of having to re-index the entire file server.
E-discovery is the killer app for Digital Reef, but the company says it also has plans to add features that will let administrators move data from one storage tier to another, set retention policies, or delete files at the end of the retention life cycle.
Digital Reef faces a host of competition, including Autonomy and Recommind. Vendors such as Guidance Software, Kazeon, StoredIQ, and Zylabs are competing to be the indexing software of choice for e-discovery efforts.