The Word on Text Mining - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Government // Enterprise Architecture

The Word on Text Mining

Text analytics provide concept discovery, automated classification, and innovative displays for volumes of unstructured documents.

Text analytics provide concept discovery, automated classification, and innovative displays for volumes of unstructured documents.

As the minutia of everyday business and personal life migrates to the Internet, small wonder that text search is likely the Web's most popular function. Who has time nowadays for Web surfing, for meandering through a network of linked pages until you come to something intriguing? We want the express train, a direct link to content. Yet text-search results sometimes seem like the generic wisdom you get randomly from a Magic 8 Ball: They're so lacking in contextual relevance that they may answer many questions other than the one you're asking. Text-search results lack the aptness that would follow from understanding the meaning of search terms — rather than just their presence or absence — and from the ability to assess the relevance of a search hit.

Text mining is poised to fill the void, structuring the information inherent in volumes of free text in ways that enable decidedly more intelligent search. There will still be a role for the thoughtful, manual classification and filtering that made Yahoo a winner from its earliest incarnation, and there will still be advantages to intentional, semantic-Web type efforts to categorize content for identification by automated agents. But just as data mining lets you discover hidden relationships in structured data and apply predictive algorithms, text mining will help identify value that you and the manual classifiers and Resource Description Framework wizards didn't know existed.

Tired of search results presented as pages of hits? Text-mining software implements innovative display and navigation techniques that graphically represent networks of conceptually interrelated documents. Although plenty of pointless graphics, animation, and other whiz-bangs adorn the Web and office software, text-mining interfaces won't be all glitz. They already harness hyperbolic (zooming) displays and other approaches that deliver results in a navigable, organized form that reflects the underlying structure of the result sets — approaches that add analytic value.

Text mining will let us move from knowledge management to knowledge analytics.


Everyone is familiar with the problem space: Languages and forms of communication are designed for human rather than machine consumption, but people's daily lives are increasingly mediated by and reliant on information technology, creating a need for innovative modes of human-computer interaction. People and computers often meet halfway, communicating via simple, structured instruction sets tailored for particular processes like operating an automated teller machine. It isn't feasible for people to go further by learning the variety of languages used to program more sophisticated transactions; instead we expect computers to understand our native languages.

This problem isn't trivial because the meaning of words is highly dependent on context and may be obscured by slang, irregular grammar, fractured syntax, spelling errors and variations, and imprecision. Interpreting among languages is also difficult when you're dealing with degrees of incomparability of syntax (composition), semantics (meaning), and alphabet. Humans can overcome these difficulties because we understand abstraction, context, and linguistic variations and can detect and apply patterns. We're not so good on speed, volume, consistency, and breadth, by which I mean an individual's ability to work in more than a handful of languages except in the most exceptional of cases.

The challenge — designing information technology that matches human language comprehension while bringing to bear the advantages of automation — defines the playing field for text mining.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Pandemic Responses Make Room for More Data Opportunities
Jessica Davis, Senior Editor, Enterprise Apps,  5/4/2021
10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Transformation, Disruption, and Gender Diversity in Tech
Joao-Pierre S. Ruth, Senior Writer,  5/6/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll