Companies used to employ armies of people to read through documents such as customer satisfaction surveys, but it took longer, cost a lot more money and yielded far less detailed, reliable and consistent information than you can now quickly uncover using text mining applications. I was in NYC at TechWeb's Interop event today and I just happened to run into Harvey Spencer, an old friend from my days as editor-in-chief of Transform Magazine. Until it was folded into Intelligent Enterprise way back in late 2004, Transform focused on enterprise content management (ECM) and business process management (BPM) challenges. Harvey was a contributing editor from the publication's start as Imaging Magazine, and he taught me everything he could about document capture when I joined the staff in 1998.
It was a nice coincidence seeing Harvey given that Intelligent Enterprise is about to launch a Tech Center (mini site) focused on ECM. I was keenly interested in hearing his take on how the world of content management is colliding with the world of business intelligence.
One key intersection between content and BI is text mining, which is taking off in the areas of voice-of-the-customer-analysis and customer-experience-management applications. This week alone I have interviewed two major hotel chains -- Choice Hotels and Gaylord -- that are using text mining software (in this case from Clarabridge) to quickly make sense of customer satisfaction surveys. Text mining quickly analyzes vast stores of content -- one chain had 600,000 customer satisfaction surveys -- and then delivers insight in the form of standardized reports and query-ready databases. Companies used to employ armies of people to read through text such as customer comment fields, but it took longer, cost a lot more money and yielded far less detailed, reliable and consistent information than you can now quickly uncover using text mining applications.
In most cases text-mining software is pointed at electronic documents, such as call center comment fields, e-mail-based surveys and online feedback forms, but there are still lots of paper documents out there. That's where Harvey comes in, as he is the world's leading expert on capture systems. By one estimate, he says, half of all invoices issued in North America are still delivered as paper documents. Thus, financial and accounting applications from the likes of SAP and Oracle are fed by capture systems. The scanning, character recognition and auto classification steps of capture are the transition point between physical and machine-readable electronic documents that are ready for text mining.
Thanks to Web-based forms, e-mail and other electronic alternatives, paper document volumes have been decreasing about 5 percent per year this decade, according to Harvey. Yet the volume of paper documents being captured and converted into machine-readable form continues to grow. Why? As the velocity of business transactions continues to increase, Harvey says, it's increasingly important to capture and "truncate" contextual information that goes along with each transaction. Whether that information is on paper or in electronic form, companies are increasingly looking for insight from the text as well as the structured data behind each transaction.
As I mentioned, voice-of-the-customer applications are a hotbed of activity currently, but there are vast repositories of content out there that contain the next big text-mining gold mines to be discovered. E-discovery, competitive intelligence, publishing and fraud detection are just a few examples.