Software // Information Management
News
11/21/2006
11:30 AM
50%
50%

Search in Focus: Implementing a Taxonomy

Search engines don't know the difference between reading glasses and drinking glasses, but a taxonomy puts your query in context. We outline several ways to build taxonomies, ranging from the tough but potentially more accurate approach of building from scratch to the easier but potentially compromised approach of buying a prebuilt taxonomy or using automated clustering software.

Mention the word "taxonomy" and some people will think you mean stuffing dead animals (as in taxidermy). Although the taxonomy may not be well known, taxonomies (or sets of categories) are used to organize quantities of information on the Internet, in portals and in enterprise data repositories. Taxonomies bring context to words, topic areas and search results.

Finding a piece of information within a large collection of data without a taxonomy is like driving in unknown territory without the benefit of a map or road signs: You may eventually stumble upon your destination, but chances are you'll encounter a lot of dead ends and detours first. A taxonomy provides a hierarchical structure of categories, from general to specific. In biology, for instance, dogs are classified under the kingdom Animalia, the phylum Chordata, the class Mammalia, the order Carnivora, the family Canidae, the genus Canis, and the species Canis familiaris.

When combined with metatagging tools, text analytics and search software, enterprise taxonomies support accurate search and guided navigation that could not be achieved with search engines alone. As data volumes increase, so, too, does the need for taxonomy. If you have 100 documents, almost any search technique will work, but if you have a terabyte worth of documents, you need sophisticated search guided by a taxonomy.

We outline several ways to build taxonomies, ranging from the tough but more potentially accurate approach of building from scratch to the easier but potentially compromised approach of buying a prebuilt taxonomy or using automated clustering software. We also examine deployment and ongoing maintenance practices, as well as the role of ontologies, which might come into play in merger and acquisition scenarios.

ASSESS THE NEED

An enterprise taxonomy attempts to classify virtually all information in an organization and brings it under one structure. Despite the many benefits (see "10 Good Reasons To Use a Taxonomy"), building a enterprise-wide taxonomy is easier said than done. Inevitably, each department has its own priorities, terminology and preferred structure for its body of information, so it's hard to get everyone to agree on one core set of categories. "Customers say this takes a long time, and they talk about people in a room yelling at each other," says Fern Halper, a partner at the research and consulting firm Hurwitz & Associates.

In some settings, universal taxonomies are an absolute must. At the Department of Homeland Security and public safety agencies, for example, taxonomies help tie together clues, establish relationships between crucial tidbits of information and spot broader security or safety threats.

Your company may or may not need an organization-wide taxonomy depending on the problems you're trying to solve. "If your application is simply to enable better retrieval of documents or better kinds of communication with structured data in databases, it may not be necessary," says Josh Powers, principal ontologist at search vendor Convera. "But if your goal is better communication throughout the company, you need to come to some agreement."

When it's time to build, there are two approaches: the tough road of trying to create and enforce a taxonomy through task forces, management edicts, training and so on; or the appeasement route, in which you create mappings between differing points of view. If the sales organization looks at the market in a different way than the product management group, you would choose the latter approach, and automated mappings could reconcile the two views with a central taxonomy (perhaps with the aid of an ontology, but more on that later).

Previous
1 of 8
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Nov. 10, 2014
Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.