Enterprise Search: Seek and Maybe You'll Find

New search appliances claim to be uniquely adapted to meet enterprise needs. We tested eight enterprise search products and analyzed the technology's security and architectural implications. Our take: The math just doesn't add up.


40: Percentage of unit licenses for enterprise search sold that Google will provide by 4Q 2007. Source: Gartner

Less than 5 percent: Global 2000 companies that will have selected Google as their primary information access software vendor by 4Q07 Source: Gartner

10: Percentage of unit licenses for enterprise search sold that Microsoft will provide by 4Q08 Source: Gartner

$30,000: Cost for Google Search Appliance capable of searching as many as 500,000 documents. Source: Google

$57,670: Estimated price for Microsoft SharePoint Server 2007 for Enterprise Search Source: Microsoft

$0: Cost for the OmniFind Yahoo! Edition to index as many as 500,000 documents (download at omnifind. ibm. Support can be purchased for $1,999 per year. Source: IBM


To decide whether you need enterprise search now or can wait for offerings to mature, you need an idea of how much time your employees spend searching for content, the location of the content that employees are seeking and what the information is used for.

For salespeople who must pull results from e-mail, a file server and a Web server to build a proposal, federated search can bring a lot of value. For remote employees who keep a lot of data on their local drives, a search app that integrates tightly with the desktop, like X1's Enterprise client, is ideal. These have a multilevel architecture--desktop agents plus server-based indexing. The client can index the local computer and communicate with a "cluster" to search server file shares.

If you plan to turn on caching, you'll have to determine if you want just text cached, with no images, or the entire document. Obviously, the latter will greatly increase the amount of space needed. On the flip side, if full-document caching is enabled, users will still be able to query and view documents when the originating source is down, provided security information is also cached, or the infrastructure is such that the search engine doesn't have to verify rights against the originating source.

Another caveat of caching depends on how the software indexes a document. Some vendors don't index common words, which helps reduce the size of the index. The downside is, a separate document cache must be created, by caching the whole document or just the document text. The cache is needed to generate page summaries. Other vendors index every word at the cost of a larger index and the benefit of not needing to create a separate document cache. Because every word is contained in the index, summaries can be generated from the index, but this method of generating summaries can be slower than generating summaries from a document cache.

Having an API available to plug into the search engine also may be of benefit. All the products we tested provide APIs to modify the behavior of some aspect of the product, like indexing and querying the index, and depending on the functionality provided by the API, developers could hook the search product into an ERP or SFA app. Thunderstone provides an XSL interface, for example, while dtSearch offers an API available in C/C++, COM, .Net and Java.

Read The Full Review Of Enterprise Search Products

Ben Dupont is a systems engineer for WPS Resources in Green Bay, Wis. He specializes in software development. Write to him at [email protected].