Vendors including Inxight and Attensity have joined ClearForest in attempting to move text mining beyond the confines of niche areas such pharmaceutical research and government intelligence. Contract management is one such application, and others include spend analysis, patent filings and research, and early warning on warranty and quality control problems.
Integrate All Forms of Information
EII is the latest buzzword associated with database integration technologies variously known as "heterogeneous distributed database," "virtual centralized database," "federated database" and "enterprise data access." Gartner analyst Ted Freidman views EII as a goal rather than a technology, with the idea being to integrate all data assets in order to deliver timely and complete views of business and customers, and to provide data consistency and cross-platform and cross-database access.
• Bridge systems and close disconnects: Do customer or internal processes require access to multiple systems and paper-based records? Business process management and enterprise information integration can bridge islands of data, content and automation.
• Eliminate human interaction: Are transactions bogged down by routine reviews and approvals? Integrate key data, set rules for exceptions and automate everything else.
• Don't read mountains of information: Coping with thousands of test reports, contracts, warranty claims, e-mail messages or shreds of textual evidence? You'll never read through it all, so consider text mining to extract knowledge and correlate it to related data.
• Cut integration time, cost: APIs and custom integrations take months and megabucks to develop and inevitably break when you upgrade software. EII promises repeatable integrations that take weeks not months, but the category is immature, and not all products can handle content.
Toby Redshaw, chief technology officer at Motorola, says EII is particularly useful for companies integrating acquired companies. "You find that you have valuable data, but the problem is gaining access and sharing that information throughout the organization," Redshaw explains. "Everybody has silos of data that can be used effectively within those silos, but the problems and the access needs are horizontal."
Building a service-oriented architecture and using EII middleware from Meta Matrix, Motorola is gaining easier access to a variety of data. In one example, the company is trying to give customers and partners visibility into its supply chain. "The repair and return shops and call centers all have good vertical data, but now we're looking at horizontal issues, and we needed a tool that digs into those pockets and normalizes the information."
Redshaw says the idea of "unstructured data" is relative, describing information as a continuum from extremely structured, clearly defined data (as in a database), to XML documents somewhere in the middle and to content such as images that don't have much structure at all.
By relying on EII software, Motorola was able to provide information on order status throughout the supply chain in nine weeks, about a third of the time it would have taken using traditional methods such as custom, API-level integration.
Put Convergence into Play
Are BPM, text mining and EII ready for mainstream deployment, or should the risk averse continue to sit on the sidelines? The answer varies by technology and application.
BPM is a maturing market overall (see "Maturity Metrics," below), but less so when it comes to letting companies flexibly and deeply tap into both data and content, says Mike Maziarka, an analyst at InfoTrends/CapVentures.
Convergence "is starting to gain traction in the accounts payable and accounts receivable area, where BPM can tap into ERP and help automate the matching of what's been billed to what's been paid," Maziarka says. "Other areas where [convergence] has big opportunities include financial applications and customer support."
If your customer and partner service problems send CSRs into disparate data and content stores and systems, follow ANICO's example and consider a process-based approach to creating a unified view of the customer. If other processes are bogged down in rubber-stamp interactions with content, weigh the cost of those inefficiencies against an investment in BPM, which for large companies is likely to start at $300,000. The more repeatable the process (and the more costly it is without automation), the more BPM makes sense. For many public companies, regulatory compliance is introducing yet another incentive for considering BPM (see "The Silver Lining in SOX").
As for text mining, EDS was an early adopter back in 2002, but EDS's Kasravi says the technology is poised for mainstream use. "Businesses large and small have to deal with reading large amounts of textual content in order to know what to do or find opportunities," he says. "Get out of the mindset that you have to read and process that information in people's heads."
E-mail routing, competitive analysis, investment analysis, news analysis, fraud detection and product liability are all areas in which text mining will flourish, says Kasravi. ClearForest puts today's typical text mining investments at $300,000 to $600,000, but broader adoption will help drive costs downward.
Compliance matters such as the Transportation Reliability Enhancement, Accountability, and Documentation (TREAD) Act (passed in the aftermath of the Ford Explorer/Firestone tire recall) and Financial Accounting Standards Board (FASB) issues will also compel companies to consider text-mining technologies. "Product liability and warranty costs are going to increase, and text mining can help provide early warning of product deficiencies," says Kasravi.
ClearForest recently partnered to develop a text-analytics-driven warranty claims application that relies on integration adapters provided by Information Builders' iWay subsidiary to access the application automotive dealers use to enter warranty information. The application is designed to integrate content and data to better analyze warranty costs related to parts, assembly problems and labor.
EII is the least developed of the three technologies discussed in this article. Among the pioneers noted here, Motorola may not have as much company as ANICO, Hasbro, NASD or EDS, but service-oriented architecture and tools such as EII are the clear direction for all enterprises. "None of us can do wholesale rip and replace anymore because the days of the megaprojects are over," says Motorola's Toby Redshaw. "Web services and loose integration allow you to harvest what you have."
|MATURITY METRICS: Data and Content Convergence|
Business process management (BPM) is the most mature of the three technologies discussed here in terms of its ability to harness both data and content. Text mining is proven as a tool for interpreting content and correlating results to data, but six-figure costs have limited deployment. Enterprise information integration (EII) is an emerging software category.
Many processes involve both data and content, and ambidextrous BPM is filling a need, particularly in loan/claim, accounts payable/receivable, CRM and compliance-related financial management processes. Text mining remains concentrated in the biotech and intelligence realms, with six-figure investments limiting interest. Most EII initiatives remain focused exclusively on data.
Many technology buyers remain Balkanized in their respective data and content mindsets, but higher-level SOA, process automation and comprehensive intelligence initiatives will force detente. Text mining opportunities will emerge in e-mail routing, competitive analysis, investment analysis, news analysis, fraud detection and warranty/quality control applications. Repetitive process and integration challenges will demand the agility and reusability of BPM and EII solutions.
Immature in the short term, but XML, Web services and service-oriented architecture (SOA) are fast easing access to all forms of information. Proven applications and incentives including compliance and competitive advantage will help punch holes in the silos and eventually erode the boundaries between data and content.
analyses apply strictly in the context of data and content convergence, not broadly to bpm, eii and text mining.