Q&A: IBM's Aaron Brown on Text Analytics for Legal Compliance
The burden of legal e-discovery mandates is driving demand for "knowledge discovery" technologies. The program director of IBM's Content Discovery and Search unit discussed the use of text analytics in legal work and other emerging applications.
Text technologies are poised to revolutionize legal compliance work, prompted by a December 2006 amendment to U.S. government rules governing discovery of electronically stored information (ESI). This e-discovery mandate — which applies to the process in which parties to a lawsuit request and provide information that may be pertinent to the case — has driven significant demand for supporting software and services. It has also created an opportunity to apply “knowledge discovery” technologies to extract information from documents, e-mail messages and other textual materials, and these analytical technologies have broad potential beyond litigation support in other business domains. In this interview, Aaron Brown, program director, Content Discovery and Search , IBM Information Management Software, shares his thoughts on the use of text analytics for e-discovery and in other emerging applications.
What’s your estimation of the legal-sector opportunity for knowledge-discovery technologies?
Legal discovery is at the front of the pack of promising new text-analytics applications along with a related cluster of use cases around compliance and legal control. Enterprises are looking to get out from under the crushing costs of traditional legal discovery and likewise to reduce the risks they face from compliance violations. Text analytics helps in both cases, in the first by helping legal users cut through the clutter quickly to find the key documents or custodians to focus on during case assessment and planning, reducing the need for expensive manual review, and in the second by helping compliance officers proactively detect patterns of non-compliant behavior — such as use of misleading language with customers or insider trading discussions — before they become major exposures both financially and in the public eye.
Yet awareness of text analytics in the legal sector is still very low, isn't it?
The technology may be just starting to make inroads, for instance with basic extraction and clustering showing up in tools for e-discovery review and early case assessment, but that's just the tip of the iceberg. Text analytics promises to change the fundamental economics of e-discovery – and to a similar extent, compliance-driven investigations – by transforming it from an exhaustive, human-centric process to a high-productivity collaboration between legal expert and analytics-driven discovery tools. [The transformed process] goes well beyond what you can do with traditional search and navigation to quickly highlight anomalies, expose unexpected patterns, etc.
Of course there are significant hurdles to overcome — notably establishing a track record in court for evidence discovered, retained, and/or prioritized by analytics-based technologies — but the great thing about the legal market is that the economics and pain points are such that this will happen, especially as more and more corporations start treating legal discovery as part of a proactive, in-house process managed by IT.
What’s the entry point for text analytics in the legal sector?
Legal analytics is one of the first mainstream applications of text analytics in the ‘traditional’ enterprise content management (ECM) space — the tip of the iceberg in enabling a broader, content-centric business-intelligence capability driven by the combination of enterprise content management with text analytics. So as you might guess, this is an area that's very interesting to me and very aligned with our text analytics focus at IBM.
Not all vendors see strong alignment of text analytics and BI. Some would instead place text analytics in the predictive-analytics category. The most prescient, if you ask me, are flexible in integration with established analytics practices but are also building the analytics into line-of-business applications.
Text analytics is not a one-trick pony. It can drive predictive analytics, it can help bring unstructured data into a traditional BI environment, it can power domain-specific solutions — like e-discovery, or the ever-popular sentiment-analysis solutions for media — and it can even drive its own content-centric alternative to traditional BI, when it's used to enable exploratory text mining to extract patterns and insight from business content.
The interesting question might be whether there's a one-size-fits-all analytics that can address all of these spaces. My view is that, today, the best traction will come in domains like legal where vertical customization is key. But I don't think this will be true long-term. I firmly believe that text analytics, framed appropriately — probably as a combination of repository-centric, operationalized extraction and business-user-centric exploratory interactive text mining and visualization — will ultimately become a horizontal capability much like traditional BI is evolving into. My crystal ball goes cloudy when I ask it to predict how long this will take, but we'll surely be doing everything we can to make it happen sooner rather than later.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 7, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program!