Semantics is hot, but only in a geeky sort of way. Contrast with search, which long ago shed its geeky image to become the Web's #1 utility. Search and semantics have similar goals and rely on similar technologies. Both apply data-structuring techniques to make information more findable and usable. Join the two and you get semantic search, in essence, search made smarter, search that seeks to boost accuracy by taming ambiguity via an understanding of context.
Semantic search is still in a definitional phase, "on its way!" as claimant Hakia puts it. Yet Hakia's own site, still in beta, only confuses with its challenge to "Compare with Google." I compared, using a term Hakia suggested, carrots. Results look pretty similar, no? So what, exactly, are the ingredients of semantic search?
Semantics (in an IT setting) is meaningful computing: the application of natural language processing (NLP) to support information retrieval, analytics, and data-integration that compass both numerical and "unstructured" information. The ever-emerging Semantic Web is, for many, the poster child, although semantic computing is advancing rapidly even while a portion of the folks who push semantic technologies seem unable to explain clearly and convincingly what business value they deliver. Another case in point: Microsoft Bing, which is alleged to deliver semantic search because, in response to certain queries, it offers you "related searches" and Wikipedia reference look-ups. Those are semantic elements, reliant on/related to the meaning of the search terms and results, given that meaning is what semantics are all about. But there must be more to semantic search than that?! There is.
Two + Nine Views of Semantic Search
A key-word search on "center" would likely produce way too many documents because "center" is a common and ambiguous term. Our semantic search engine supports a query language called XML Fragments. This query language is designed to exploit UIMA’s CAS annotations entered in the search engine’s index. The XML Fragment query, for example,This capability extends the search on document-level metadata and tags you can do with mainstream systems such as Google, where you can currently enter filetype:pdf (for example) or would enter terms in a fielded search interface such as the one offered by Google patent advanced search.
<organization> center </organization>will produce first only documents that contain "center" where it appears as part of a phrase annotated as an organization by a named-entity recognizer.
A quick announcement of a conference I'm organizing: The 2010 Sentiment Analysis Symposium will take place April 13 in New York, looking at solutions that discover business value in opinions and attitudes in social media, news, and enterprise feedback. Follow me on Twitter, or follow the symposium, for program updates.