Internet or enterprise: Take your pick. Few users are looking for the hit lists full of dubious relevance returned by the major search engines. They want plain and simple answers to straightforward questions: prices, store hours, the birth date of Thomas Jefferson for that school paper and so on. Spending time scouring Web pages and clicking around rigid, overdesigned portals, whether on the public Internet or a corporate intranet, is getting old. Users need the question-answering technology, 40 years in the making, that is starting to appear in search tools in the enterprise and on the Web.
"Question answering is the future of enterprise search," says Matthew Glotzbach, head of products for Google Enterprise, and that applies equally to public Web search.
Already when I search Google or Yahoo or MSN for "population of Peru," I get the figure I want without the need to sift through a grab bag of geography and history pages for that small bit of factual information. Some of the engines will respond with a correct answer to "2 + 2 - 1/17" or "map Georgia." Credit text analytics at work. We have natural language processing (NLP) to detect implied query, to discern and extract and disambiguate "entities," such as place names, and to pull information from documents identified as possibly containing answers.
That last answer-extraction stage can use Web-search results or authoritative reference sources, such as Wikipedia or technical manuals, for particular business domains. It is facilitated by search-BI interfaces (explored in my July column, "Will Search Deliver Better BI?) and adapters that let search engines reach into the deep Web of databases and operational systems not indexed by Web spiders. The final step is to formulate and present a single answer by assessing and weighing the available sets of facts and figures.
You can try several question-answering sites on the Web: Two that come to mind are brainboost.com, which taps the answer.com reference site, and start.csail.mit.edu, MIT Professor Boris Katz's site that has been up since 1993. An active academic community has been looking at this application of NLP and text mining, and a Web search will turn up a variety of demonstration sites. I particularly like TextMap at the University of Southern California, which provides an optional technical analysis of questions. Academic and industrial R&D efforts benefit from collaboration and competition at venues such as the Cross-Language Evaluation Forum and the Text Retrieval Conference, sponsored by the U.S. National Institute of Standards and Technology.
If you try a few systems, you'll see that much work remains to be done. Even purpose-built question-answering systems, which do a better job than conventional search engines, rapidly go off track in the face of complexity. A modifier added to a simple question--"What is the Latino population of Miami, Florida?" for instance--knocks out every system I've tried.
Brainboost almost got that last question right but ultimately could not assemble fragments available in its reference source into an on-target response. Researchers are only starting to attempt more complex questions where the answer must be composed from bits and pieces found in disparate sources, according to Diego Mollá, a senior lecturer at Macquarie University, Sydney, Australia. They are also grappling with conceptual (as opposed to factual) questions about abstract ideas such as truth and value and, Mollá says, they are trying to come up with reliable ways to evaluate the correctness of longer answers extracted from sources.
Glotzbach tells me that much of Google's work involves better understanding user intent. The biggest challenges, he says, are comprehending searches of only a word or two, and, paradoxically, those containing additional modifiers, which would create more precise, less ambiguous questions.
My take is that good answers will require significant progress on many fronts. The potential payoff--search done right--will justify efforts.