Commentary

Fritz Nelson
 

Truevert's Semantic Search

Semantic search is like porn: I'm pretty sure I'll know it when I see it. So when semantic search upstart Truevert came by for a visit, I got all googly (I think I might have even screamed "yahoo"). The Truevert system, powered by OrcaTec's discovery toolkit, is narrowly defined around green, but it's definitely an eye-opening, fresh approach to an elusive problem.

Semantic search is like porn: I'm pretty sure I'll know it when I see it. So when semantic search upstart Truevert came by for a visit, I got all googly (I think I might have even screamed "yahoo"). The Truevert system, powered by OrcaTec's discovery toolkit, is narrowly defined around green, but it's definitely an eye-opening, fresh approach to an elusive problem.Here is Part 1 of our video discussion with Truevert, including a demonstration of the technology.


More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

Here's Part 2, where we discuss competitors (namely Powerset, now owned by Microsoft) and the nature of other ontological approaches to semantic search.

To be fair, whenever I hear about the semantic Web, I think of a magic, omniscient elf scurrying around squillions of sites, assigning meaning based on the context of, well, everything. So my expectations are high. But frankly, so is my disappointment with traditional search, even if it's changed how most of us view and use the Web. On the one hand, I don't want to become a concatenation expert, but neither do I want Aunt Millie's musings on managing her household budget when I search Google for microfinance. These seem to be my only two choices for better results.

OrcaTec co-founder Herbert Roitblat began by saying that ontology, often thought of as the way toward a semantic Web, is flawed. (He also began by saying that Google's page rank is a popularity contest.) There are lots of ways to categorize and almost no agreement, and the people designing these schemas are not the same people looking for the information.

Even if you were precise in your search terms on a normal search engine, Roitblat summarized, you're really narrowing by exclusion rather than precision. If you enter Green Toilets in Yahoo in an attempt to find more energy-efficient commodes, you would, instead, find avocado or sea-foam green colored toilets.

A true semantic-based approach trusts a context, rather than a categorization. OrcaTec started Truevert with a more vertical approach, namely "green." So everything gets searched through that filter. It uses Yahoo BOSS to gather a Web search, but it then re-ranks the results based on its own language model derived from understanding the association and context of words from 6,000 green-tagged documents in Delicious (which it can do on a mere laptop in less than 15 minutes). Google's terms of service, Roitblat says, don't allow re-ranking of pages the way Truevert does it.

Roitblat says the company chose green because it wanted to start out doing some good, but also because it's a category people can easily understand. The approach can be applied to any vertical using the same approach. You could even apply it to enterprise content management, given that most corporations have their own jargon -- you just train the engine on the documents that you index.

You also can imagine that if you can get more precise in your search results, a decent amount of ad revenue, in the form of better matching, might result.

Truevert competes with a growing list of other new players, like Hakia, Powerset, and Thomson/Reuters Calais. Microsoft recently purchased Powerset. I haven't talked with any of these companies. Yet. I'm sure they'll find me.


Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
T-Shirt Giveaway T-Shirt Giveaway: Each week we're selecting one great comment from our readers. The author of the comment will receive an InformaitonWeek Community t-shirt. So get posting!
Subscribe to RSS

Resource Links