Enterprise Search: Microsoft, Google, Specialized Players Vie For Supremacy

Enterprise search assists in litigation, securing sensitive data, managing information, and building smarter applications. Oh, yeah, and finding that PowerPoint slide from 2006. But which vendor is best to partner with, and what are the technical challenges?

Andrew Conry Murray, Director of Content & Community, Interop

September 26, 2008

3 Min Read
InformationWeek logo in a gray background | InformationWeek

KEY ISSUES
Regardless of the type of search you're interested in, there are technological issues that must be addressed, including indexing speed, index size, and security. In a discovery effort, time is of the essence. Initial results may need to be available to counsel within weeks. That may sound like a long time, but not when faced with repositories that hold multiple terabytes of information that have to be indexed before anything else can happen.

Indexing times are fluid. How quickly an engine can create an index depends on the content. A file share full of PowerPoint slides with 25 words per page will be indexed in a blink. Text-heavy documents take longer, as do PST files that have to be cracked open or files that may have multilevel attachments.

Some search products will federate with an index that has been created by the repository's native search feature, such as a Documentum repository or an e-mail archive. This speeds indexing time and saves on storage space. Through federation, the third-party search engine essentially brings the query to the application's native search field, and then incorporates the results into its own user interface.

Note that most e-discovery search vendors prefer to index content themselves, whether or not the targeted repository has native search capability.

Customers also have to take the search infrastructure into account. Google and StoredIQ deliver via an appliances, while the other search products are pure software deployed on servers. IT must provide sufficient processing capacity to handle volumes of queries. This may not be an issue with compliance search, which isn't intended to address simultaneous search requests from a large audience of users.

Companies must also provide storage for the index (except for Google and StoredIQ). Vendors usually estimate the index size as a percentage of the content being cataloged. For example, if the index is 10% of the content, a 100-TB body of data will yield a 10-TB index. The primary factor is how detailed you want the index to be. For instance, the Fast search engine can produce an index that runs about 20% of the size of the content, but most organizations will enrich the index through advanced linguistics to provide more detailed search results. Microsoft's Spataro says customers opting for a rich index should expect it to run two or three times the size of the actual content store.

Another issue is how the search engine links to content repositories. Most search products include out-of-the-box connectors for popular platforms, such as Exchange, Notes, SharePoint and Documentum, as well as general-purpose connectors for file and Web servers. However, IT may need to tweak connectors or build one-off integrations if a critical application or repository isn't supported.

CIOs also need to make sure users don't get access to search results that violate corporate access controls. Most search engines can match user identities to permissions associated with groups in the company's directory system.

Bottom line, enterprises should approach search as a strategic technology that will help solve specific business problems. To that end, companies must understand their own requirements when evaluating search platforms--IT should involve business units, legal, and HR, and start with the business case to see where search will provide value. Done right, this is one technology that will pay off not just in dollars, but in productivity and peace of mind.

Illustration by Ryan Etter

About the Author

Andrew Conry Murray

Director of Content & Community, Interop

Drew is formerly editor of Network Computing and currently director of content and community for Interop.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights