The OmniFind Personal Email Search, developed by researchers at IBM facilities in California, Israel, and India, takes search beyond keywords by being able to make associations between the underlying concepts of words often used in corporate e-mail. In doing that, IBM is looking to offer customers technology that can help them retrieve useful information hidden in e-mail databases.
For example, if a person is looking for a colleague's phone number, then typing "John phone" in the query box would return John's phone number. That's because the system is smart enough to make the association that the person is looking for John's number, and not just any phone number in an e-mail with the word John in it.
To do this, researchers built an index of keywords found in corporate e-mail, and then built another index of associated concepts and relationships, Shiv Vaithyanathan, manager of unstructured information mining at IBM's Almaden Research Center in San Jose, Calif., told InformationWeek. When a query is submitted, the system first matches the words with those in the keyword index, and delivers results based on the associations. Researchers have added rules to help the system determine what information is most likely being sought.
OmniFind Personal Email Search uses the Unstructured Information Management Architecture, an open source software framework for semantic search. IBM was the developer of the technology, which is now under the Apache Software Foundation.
With the latest e-mail search engine, IBM researchers have developed technology that runs on top of the framework for quickly extracting words and concepts from electronic documents, comparing them to the indexes and delivering results based on algorithms developed by the scientists.
Many companies, including Google, Microsoft, and Yahoo, are developing more advanced search algorithms in order to deliver better answers to queries. Vaithyanathan acknowledged that semantic search becomes less effective as the universe of possible concepts and relationships associated with words grows, which is why it would be difficult to implement, for example, in a huge, general purpose Web mail system.
Targeting a corporate e-mail system gives researchers a narrower set of possibilities in determining the kind of associations the system should make of words. In addition, IBM has made it possible for developers to expand the OmniFind indexes.
IBM is making the software available at no charge on its AlphaWorks Web site. The company is hoping the feedback it gets from developers will help them improve the technology by finding its shortcomings. "Finding out the things we should do is precisely why we're putting it out there," Vaithyanathan said.