Software and service text-analytics revenues now total $835 million globally, according to a 2010 market study I completed recently.
Growth is steepest for applications that seek business insight in social networks, online media, and surveys. Applications include brand-reputation management, market research, competitive intelligence, and customer service and support. For these applications and others, text analytics brings automated, natural-language processing techniques to bear to identify and extract names, facts, relationships, sentiment, and other information in blogs, forums, news, social updates, e-mail, and a range of enterprise sources.
I'll will describe my findings in detail at next week's 2011 Text Analytics Summit, slated for May 18-19 in Boston. These content-analysis capabilities extend business intelligence (BI) and predictive-analytics coverage, as Doug Henschen reported with an online focus in Social Media Shapes Up As Next Analytic Frontier.
My $835 million market-size estimate covers software licenses, service subscriptions, and vendor-provided technical support and professional services. Despite strong growth, it remains a small fraction of Gartner's $10.5 billion 2010 valuation of the broader BI, analytics, and performance-management software market.
My estimate captures the value both of core content-analysis capabilities and of text analytics' contribution to four content-related application categories: Information capture (most often via Web scraping), information management (descriptive metadata and "unstructured" text), text-fueled enterprise applications, and search-based applications.
The search-based applications category is worth an estimated $300 million of the $835 million text-analytics total value. It includes Web and enterprise search, e-discovery, and business, scientific, and legal information services. The last three are typically accessed via search interfaces and rely on knowledge bases populated by mining textual sources, such as judicial records, research papers, and online forums. Examples include the West Litigation Monitor from Thomson Reuters, Elsevier's SciVerse platform, and ConsumerBase from NetBase.
The enterprise-applications category comprises software and services for business functions such as customer relationship management (CRM), market research, enterprise feedback management/surveys, and competitive intelligence.
Enterprise information management (EIM) systems store both text and accompanying descriptive metadata (such as author, title, topic, publication date, and tags) that facilitates publishing the stored text as "smart content." EIM places a premium on multi-channel publishing, reuse, and targeting.
Information capture or acquisition starts with Web crawling and page or document retrieval, for example, locating and scraping prices from online commerce sites for competitive intelligence purposes. It further includes text extraction from mark-up and binary formats such as HTML, PDF, and Word; data cleansing that removes ads, menus, spam, and other extraneous content; metadata extraction, and deduplication.