Indexing times are fluid. How quickly an engine can create an index depends on the content. A file share full of PowerPoint slides with 25 words per page will be indexed in a blink. Text-heavy documents take longer, as do PST files that have to be cracked open or files that may have multilevel attachments.
Note that most e-discovery search vendors prefer to index content themselves, whether or not the targeted repository has native search capability.
Customers also have to take the search infrastructure into account. Google and StoredIQ deliver via an appliances, while the other search products are pure software deployed on servers. IT must provide sufficient processing capacity to handle volumes of queries. This may not be an issue with compliance search, which isn't intended to address simultaneous search requests from a large audience of users.
Companies must also provide storage for the index (except for Google and StoredIQ). Vendors usually estimate the index size as a percentage of the content being cataloged. For example, if the index is 10% of the content, a 100-TB body of data will yield a 10-TB index. The primary factor is how detailed you want the index to be. For instance, the Fast search engine can produce an index that runs about 20% of the size of the content, but most organizations will enrich the index through advanced linguistics to provide more detailed search results. Microsoft's Spataro says customers opting for a rich index should expect it to run two or three times the size of the actual content store.
Another issue is how the search engine links to content repositories. Most search products include out-of-the-box connectors for popular platforms, such as Exchange, Notes, SharePoint and Documentum, as well as general-purpose connectors for file and Web servers. However, IT may need to tweak connectors or build one-off integrations if a critical application or repository isn't supported.
CIOs also need to make sure users don't get access to search results that violate corporate access controls. Most search engines can match user identities to permissions associated with groups in the company's directory system.
Bottom line, enterprises should approach search as a strategic technology that will help solve specific business problems. To that end, companies must understand their own requirements when evaluating search platforms--IT should involve business units, legal, and HR, and start with the business case to see where search will provide value. Done right, this is one technology that will pay off not just in dollars, but in productivity and peace of mind.
Illustration by Ryan Etter
Regardless of the type of search you're interested in, there are technological issues that must be addressed, including indexing speed, index size, and security. In a discovery effort, time is of the essence. Initial results may need to be available to counsel within weeks. That may sound like a long time, but not when faced with repositories that hold multiple terabytes of information that have to be indexed before anything else can happen.
« Previous Page | 1 | 2 | 3 | 4 | 5
Stay connected and informed by visiting our Enterprise IT Community!

Become a member today for instant access to free InformationWeek research, expert advice, peer perspectives, and more on the following topics:
- Application Performance Management (APM)
- Security Management
- Mainframe 2.0
- IT Automation
- Service Assurance
Also, visit our Government, Retail and Financial Services groups to see how these technologies apply specifically to those industries.
NOTE: Offer valid for U.S., U.S. possessions, & Canada only.