Commentary

Andrew Conry Murray
 

Ambitious Startup Wants To Manage All Your Unstructured Data

Digital Reef swings for the data management fences by indexing and classifying all unstructured data in the enterprise. Top applications include e-discovery and storage management.

Digital Reef swings for the data management fences by indexing and classifying all unstructured data in the enterprise. Top applications include e-discovery and storage management.Digital Reef, which is having its official company launch today, has no shortage of ambition. The company aims to index and auto-classify all the unstructured data floating around on file servers, backup systems, archives, e-mail, collaboration tools, and content management systems.

The goal is to make unstructured data easier to manage when it comes to e-discovery, storage management, and compliance.


More SMB Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

The company takes a comprehensive approach to indexing and classifying all the unstructured data in an enterprise. The company deploys software that crawls network storage systems and creates a full-content index of everything it finds, including metadata. It supports NFS and CIFS so it can mount most file stores. The company also has prebuilt connectors to get information stored in applications such as SharePoint, Exchange, and Lotus Notes.

The index is stored on a grid computing cluster. Customers can use commodity hardware for the grid. Content on the grid is stored in a flat-file format rather than in a database.

As the software indexes content, it also analyzes it with a similarity engine. This engine, which is the primary IP of the company, performs two major functions. First, it looks for duplicates or near-duplicates of files. By identifying duplicates, the index can return fewer files in an e-discovery exercise, saving time and money on document review.

The second function is auto-classification. The software looks at every piece of data in a file and suggests a classification based on the most relevant semantic ideas being expressed in the file. According to the company, the software doesn't need to be trained before it classifies and categorizes content. "We label all our folders with the top terms that placed a document into the folder, so you can understand why a document is in that folder," says Brian Giuffrida, VP of marketing and development at Digital Reef.

The index is fully searchable. While the company says general users can search the index, it's aimed more at legal and compliance managers that need to search large volumes of information for efforts such as e-discovery or to find sensitive data such as Social Security or credit card numbers that may need to be moved to a more secure location.

The company claims it can index and classify up to 4 TB worth of files every 24 hours. Management software routes jobs among servers in the grid to balance loads. If an indexing job on a particular target fails, the software can restart the job from the failure point instead of having to re-index the entire file server.

E-discovery is the killer app for Digital Reef, but the company says it also has plans to add features that will let administrators move data from one storage tier to another, set retention policies, or delete files at the end of the retention life cycle.

Digital Reef faces a host of competition, including Autonomy and Recommind. Vendors such as Guidance Software, Kazeon, StoredIQ, and Zylabs are competing to be the indexing software of choice for e-discovery efforts.


Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
T-Shirt Giveaway T-Shirt Giveaway: Each week we're selecting one great comment from our readers. The author of the comment will receive an InformaitonWeek Community t-shirt. So get posting!
Subscribe to RSS

Resource Links