Text technologies are poised to revolutionize legal compliance work, prompted by a December 2006 amendment to U.S. government rules governing discovery of electronically stored information (ESI). This e-discovery mandate — which applies to the process in which parties to a lawsuit request and provide information that may be pertinent to the case — has driven significant demand for supporting software and services. It has also created an opportunity to apply “knowledge discovery” technologies to extract information from documents, e-mail messages and other textual materials, and these analytical technologies have broad potential beyond litigation support in other business domains. In this interview, Aaron Brown, program director, Content Discovery and Search , IBM Information Management Software, shares his thoughts on the use of text analytics for e-discovery and in other emerging applications.
What’s your estimation of the legal-sector opportunity for knowledge-discovery technologies?
Yet awareness of text analytics in the legal sector is still very low, isn't it?
The technology may be just starting to make inroads, for instance with basic extraction and clustering showing up in tools for e-discovery review and early case assessment, but that's just the tip of the iceberg. Text analytics promises to change the fundamental economics of e-discovery – and to a similar extent, compliance-driven investigations – by transforming it from an exhaustive, human-centric process to a high-productivity collaboration between legal expert and analytics-driven discovery tools. [The transformed process] goes well beyond what you can do with traditional search and navigation to quickly highlight anomalies, expose unexpected patterns, etc.
Of course there are significant hurdles to overcome — notably establishing a track record in court for evidence discovered, retained, and/or prioritized by analytics-based technologies — but the great thing about the legal market is that the economics and pain points are such that this will happen, especially as more and more corporations start treating legal discovery as part of a proactive, in-house process managed by IT.
What’s the entry point for text analytics in the legal sector?
Legal analytics is one of the first mainstream applications of text analytics in the ‘traditional’ enterprise content management (ECM) space — the tip of the iceberg in enabling a broader, content-centric business-intelligence capability driven by the combination of enterprise content management with text analytics. So as you might guess, this is an area that's very interesting to me and very aligned with our text analytics focus at IBM.
Not all vendors see strong alignment of text analytics and BI. Some would instead place text analytics in the predictive-analytics category. The most prescient, if you ask me, are flexible in integration with established analytics practices but are also building the analytics into line-of-business applications.
Text analytics is not a one-trick pony. It can drive predictive analytics, it can help bring unstructured data into a traditional BI environment, it can power domain-specific solutions — like e-discovery, or the ever-popular sentiment-analysis solutions for media — and it can even drive its own content-centric alternative to traditional BI, when it's used to enable exploratory text mining to extract patterns and insight from business content.
The interesting question might be whether there's a one-size-fits-all analytics that can address all of these spaces. My view is that, today, the best traction will come in domains like legal where vertical customization is key. But I don't think this will be true long-term. I firmly believe that text analytics, framed appropriately — probably as a combination of repository-centric, operationalized extraction and business-user-centric exploratory interactive text mining and visualization — will ultimately become a horizontal capability much like traditional BI is evolving into. My crystal ball goes cloudy when I ask it to predict how long this will take, but we'll surely be doing everything we can to make it happen sooner rather than later.