There are a number of vendors vying to help businesses manage and analyze the 80-85 percent of all corporate data that's stored in unstructured and semi-structured formats -- and some of their names might surprise you.
So now we have the key insight -- new data is constantly coming forward and vying to become part of the decision-making set, while existing information may be re-molded and even discarded in parts. Within BI, semi-structured text, geographic information, and other non-traditional data resources are vying with numeric data as best indicators and predictors for decision models. Thus we arrive at the current state of information affairs -- there are a variety of tool-makers (BI vendors, database firms, information integrators and start-ups) at work on these globs of data, trying to turn unstructured and semi-structured data into more useful and actionable information.
The Information Makers
The new Information Makers are software companies that are unlocking the value of unstructured and semi-structured data. The single most influential player in the field surprisingly does not come from among search giants such as Google, Microsoft or Yahoo, nor from among BI players such as Cognos, Hyperion, SAS and SPSS. Rather, it's IBM. IBM has managed to pull together its own research labs work, along with results from universities, to deliver the Unstructured Information Management Architecture (UIMA). UIMA is an API for processing unstructured data of all types (text, speech, video, audio, etc.) into a series of open, standardized and extensible methods.
Hadley Reynolds, of the Delphi Group, has noted that "IBM's UIMA framework proposes a new 'standard' for text [and other] analytics implementations that includes common interface definitions and a common data model. It does not include a search engine for distribution or a runtime environment in which to process and provision analytic applications to business systems ... the big news is that IBM is throwing its weight behind an infrastructure that can reduce the complexity of implementing analytic applications."
UIMA has been embedded into IBM products like WebSphere Information Integrator OmniFind Edition, Lotus Workplace and WebSphere Portal Server. Sixteen other vendors have signed onto using the API framework, including a cross-section of BI and content analysis players such as Attensity, ClearForest, Cognos, Inxight, SAS and SPSS. The UIMA framework, plus IBM's acquisitions of Ascential, Bowstreet and iPhrase signals that IBM is staking a large position in the emerging information-making marketplace. Of equal import is Curt Monash's idea of an ontology management system. Is UIMA the underpinning of such a system? Time will tell.
Another company that's made a big unstructured data play is Autonomy, which in November 2005 bought out search engine company Verity for a cool $500 million. This places Autonomy at the head of the business search market, according to a recent assessment from Forrester. Autonomy offers its Intelligent Data Operating Layer (IDOL) server, which integrates the latest in personalization, collaboration and retrieval features. It also has subsidiaries in the sound-processing (SoftSound), and video-interpretation fields (Virage). In very quick order, Autonomy has combined the IDOL framework with Verity's K2 advanced word-phrasing, relational taxonomies, and other classifying features. The combo may well add savvy to general search and analytics customers, while bringing new features to its forms, BPM and business search products.