Search engines don't know the difference between reading glasses and drinking glasses, but a taxonomy puts your query in context. We outline several ways to build taxonomies, ranging from the tough but potentially more accurate approach of building from scratch to the easier but potentially compromised approach of buying a prebuilt taxonomy or using automated clustering software.
Mention the word "taxonomy" and some people will think you mean stuffing dead animals (as in taxidermy). Although the taxonomy may not be well known, taxonomies (or sets of categories) are used to organize quantities of information on the Internet, in portals and in enterprise data repositories. Taxonomies bring context to words, topic areas and search results.
Finding a piece of information within a large collection of data without a taxonomy is like driving in unknown territory without the benefit of a map or road signs: You may eventually stumble upon your destination, but chances are you'll encounter a lot of dead ends and detours first. A taxonomy provides a hierarchical structure of categories, from general to specific. In biology, for instance, dogs are classified under the kingdom Animalia, the phylum Chordata, the class Mammalia, the order Carnivora, the family Canidae, the genus Canis, and the species Canis familiaris.
When combined with metatagging tools, text analytics and search software, enterprise taxonomies support accurate search and guided navigation that could not be achieved with search engines alone. As data volumes increase, so, too, does the need for taxonomy. If you have 100 documents, almost any search technique will work, but if you have a terabyte worth of documents, you need sophisticated search guided by a taxonomy.
We outline several ways to build taxonomies, ranging from the tough but more potentially accurate approach of building from scratch to the easier but potentially compromised approach of buying a prebuilt taxonomy or using automated clustering software. We also examine deployment and ongoing maintenance practices, as well as the role of ontologies, which might come into play in merger and acquisition scenarios.
ASSESS THE NEED
An enterprise taxonomy attempts to classify virtually all information in an organization and brings it under one structure. Despite the many benefits (see "10 Good Reasons To Use a Taxonomy"), building a enterprise-wide taxonomy is easier said than done. Inevitably, each department has its own priorities, terminology and preferred structure for its body of information, so it's hard to get everyone to agree on one core set of categories. "Customers say this takes a long time, and they talk about people in a room yelling at each other," says Fern Halper, a partner at the research and consulting firm Hurwitz & Associates.
In some settings, universal taxonomies are an absolute must. At the Department of Homeland Security and public safety agencies, for example, taxonomies help tie together clues, establish relationships between crucial tidbits of information and spot broader security or safety threats.
Your company may or may not need an organization-wide taxonomy depending on the problems you're trying to solve. "If your application is simply to enable better retrieval of documents or better kinds of communication with structured data in databases, it may not be necessary," says Josh Powers, principal ontologist at search vendor Convera. "But if your goal is better communication throughout the company, you need to come to some agreement."
When it's time to build, there are two approaches: the tough road of trying to create and enforce a taxonomy through task forces, management edicts, training and so on; or the appeasement route, in which you create mappings between differing points of view. If the sales organization looks at the market in a different way than the product management group, you would choose the latter approach, and automated mappings could reconcile the two views with a central taxonomy (perhaps with the aid of an ontology, but more on that later).
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.