Another user, Randy Collica of HP, reported what I'd characterize as extreme ROI via text analytics. Where HP CRM survey analysis used to take two-and-one-half weeks for a team of six staff; Randy can now build a predictive model himself in four days. These figures mirror the experience reported at last year's Boston summit by Greg Talkington, an EDS human-resources analyst, who spoke of reducing an eight-person-week effort to half a day's work for one staffer.
Presentations and off-line discussions confirmed the strength of the expansion of the text-analytics market to include traditional BI approaches and users, what one vendor calls Unified BI. This is a trend I examined in my white paper for the upcoming North American summit, June 12-13 in Boston. So on the one hand we have traditional (for text analytics) investigatory analyses - automated discovery of needles in haystacks that is typified by pharma research and national-security investigations - and on the other a more-rapidly growing market segment applies the technology for fact and sentiment-extraction from online social media and surveys and call-center notes. Vendor SPSS has gone so far as to brand this type of work Enterprise Feedback Management. We'll see if that label sticks. Another interesting point: SPSS market-strategy VP Olivier Jouve reports that in the US and Europe, 30%-40% of their new data-mining customers license text-mining tools. The figure in Japan is 70%.
On the technology front, it seems that users are still struggling to find workable approaches to cross-language text analytics, which is important if you take in source materials in multiple languages. Thoughts on the accuracy of machine translation are mixed, but there seems to be consensus that analysis in the originating language and consolidation of results is preferable to translation to a canonical language such as English. A couple of vendors with significant linguistic capabilities, Inxight and TEMIS, were represented at the summit, and I hope to learn about another vendor's approach at Basis Technology's government user conference next month in Washington.
Lastly, it's amusing what small bits of flashiness catch an audience's attention. Last June in Boston it was David Bean of Attensity's showing, in passing, real-time syntactic analysis that included part-of-speech tagging. The software - not a component users would normally see - builds a tree from text typed in a box: a small element of a larger system but eye-catching nonetheless. This year it was Henk Alles of Infolution's controlling the length of a text summary - shortening and expanding it - by moving a slider control. A small point but a graphic illustration of what the technology can do.I had the privilege of chairing last week's European Text Analytics Summit in Amsterdam. I've never attended any other computing event that mixes scientists, police investigators and media-company product managers with technologists. I report here on a few points that are worthy of note, grouped under the headings user stories, market, and technology.