Semantic Search Footnotes: Concepts, Ontologies & Real Time
I want to respond to a few comments/suggestions I received about my recent Intelligent Enterprise story, Breakthrough Analysis: Two + Nine Types of Semantic Search. Jim Hendler suggested a semantic-search approach I'd missed, "real-time search with some sort of filtering." I'll tackle that and points raised by NLP/semantics researcher Tom O'Hara...
I want to respond to a few comments/suggestions I received about my recent Intelligent Enterprise story, Breakthrough Analysis: Two + Nine Types of Semantic Search -- it also ran in InformationWeek -- regarding semantic-search definitions and examples.
My article gained hundreds of page views and a couple of dozen tweets, but there was only one suggestion of a semantic-search approach I'd missed, "real-time search with some sort of filtering," that from Jim Hendler, who is certainly an authority on semantics, more on which later. I'll start, however, by elaborating on points raised by NLP/semantics researcher Tom O'Hara in an e-mail message.Concepts, ontologies, and Powerset
Tom O'Hara wrote me,
I ran across your semantic search article in Intelligent Enterprise, which was good in pointing out the various aspects involved. Having a personal interest in this area from previous work at NMSU and Cycorp, I was wondering whether you good provide more details on points 6 & 7, concept and ontology-based search, respectively.
Note that the film example seems more like synonym expansion, which is not quite concept search. Also, perhaps you should have mentioned Powerset for point 11 (natural language search), especially as they are now part of Microsoft.
I'm not a semantic-search insider, and from where I sit, I do see synonym expansion as an example of concept search. Synonyms are conceptually related. So another concept-search example would be if I search on "synonym" and, failing more relevant results, I additionally got results on antonyms, homonyms, and other "nym" words.
I'm defining by example here. I suppose my definition would be "search that relates terms to abstract, conceptual classes of terms to which the terms belong." "Ford" as a search term belongs to concepts that include people, companies, and geographic terms. Within the class "people," "ford" belongs to subclasses that include politicians, movie stars, industrialists, and so on. Implicitly, you're going to use the search context to understand the searcher's intent, to disambiguate, and the nature of a term's belonging to different classes to cluster and rank the results.
For ontology based search, the engine would use facts involving search terms -- each term's relationships with other terms -- in order to move beyond keywords. Turning this statement inside-out: ontologies capture knowledge in the form of interrelationships among terms, and ontology-based search uses these interrelationships to infer the meaning (semantics) of search queries and provide more meaningful (relevant, usable) results. An ontology-supported search on "diabetes" might turn up related genetic markers or therapies, for instance, based on a clinical ontology.
I'll elaborate on one particular, basic word relationship: "is synonymous with." Synonymy, as a subspecies of concept search (as I defined it above), would be ontology-based if synonymy of particular terms is captured in an ontology used to power a search.
You might apply distinct ontologies for searches for information on, for instance, ornithology, building supplies, and weather information. There's a big difference between a hawk and a handsaw, that a general-purpose search tool might not see.
Yes, I might in my article have mentioned Powerset, also some other products. Given the attention that Powerset has received, especially after Microsoft bought the tools and applied some of its capabilities in Bing, it was an omission that I did not. I'll fix that now. The "2" of my "2 + 9" article title are Powerset elements incorporated into Bing. I used "2 + 9" in the title specially to say that there's a lot more to semantic search than those two Powerset-originated capabilities.
I will say further that I've always thought Powerset got too-favorable press given that it is domain-limited to Wikipedia and Freebase sources. It's always a good idea to start with a problem that you can solve. All the same, restricting your semantic search to Wikipedia and Freebase is fishing only in a stocked lake. The FAQ page on Powerset's Web site says, "In the coming months, Powerset will expand our product offerings with additional premium content and exciting new features." The Powerset FAQ appears not to have been updated since Microsoft's August 1, 2008 acquisition of the company, and I infer that Powerset's plans have changed post-acquisition. Microsoft claims to be taking semantic search seriously but I haven't seen any concrete information or time-tables.
@SethGrimes nice article, btw, but you left out real-time search w/some sort of filtering - cf http://feeltiptop.com - interesting idea
TipTop founder Shyam Kapur did post a comment on Intelligent Enterprise's Web site, writing,
My creation TipTop http://FeelTipTop.com is perhaps the most promising semantic search tool out there today but it does not fit neatly into any of your categories. Give it a try to enlarge some more your understanding of what is possible.
TipTop appears to do a basic sentiment classification of tweets; it is limited to Twitter as a source [revised February 1:] according to the company's FAQ page; CEO Shyam Kapur says that TipTop Shopping harvests Amazon Reviews. According to the FAQ,
Most search engines, including Google, crawl the Web looking for any text that matches your search terms. But TipTop goes one step further, it figures out what is currently being said about your search terms and delivers results making it easier to find what you need.
Hmm... Google searches both Twitter (in near real time) and the rest of the visible Web (and some of the deep Web), while TipTop claims to go "one step further" despite searching only Twitter. Further, I don't see anything in TipTop that differentiates it as better than the many other Twitter search/sentiment-classification tools. Instead, I see "tip tweets" rated as positive that include, in response to a search on Massachusetts senate candidate "Martha Coakley," such gems as:
SNL Rips Dems: 'Martha Coakley Couldn't Beat Dick Cheney for Mayor of Berkeley'
So all the ammo that should have been unloaded on Jay Leno was reserved for Martha Coakley. Stay classy, #SNL.
So president obama is campaigning for Michael Bennet next month...let's hope that works as well as his campaign for Martha
Yes, they include slang such as "rips" and colloquialisms such as "ammo... unload" and difficult-to-decode sarcasm, but -- noting that this and a search on "sentiment analysis" were the only test queries I ran -- I suggest that TipTop rein in its claims.
So there you have it: semantic search footnotes to further explain a number of points in my "2 + 9" article.
For more on sentiment analysis, check out a new conference I'm organizing, the Sentiment Analysis Symposium, April 13 in New York.I want to respond to a few comments/suggestions I received about my recent Intelligent Enterprise story, Breakthrough Analysis: Two + Nine Types of Semantic Search. Jim Hendler suggested a semantic-search approach I'd missed, "real-time search with some sort of filtering." I'll tackle that and points raised by NLP/semantics researcher Tom O'Hara...
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.