Q&A: IBM's Aaron Brown on Text Analytics for Legal Compliance

The burden of legal e-discovery mandates is driving demand for "knowledge discovery" technologies. The program director of IBM's Content Discovery and Search unit discussed the use of text analytics in legal work and other emerging applications.

Seth Grimes, Contributor

March 3, 2008

8 Min Read
InformationWeek logo in a gray background | InformationWeek

Text technologies are poised to revolutionize legal compliance work, prompted by a December 2006 amendment to U.S. government rules governing discovery of electronically stored information (ESI). This e-discovery mandate — which applies to the process in which parties to a lawsuit request and provide information that may be pertinent to the case — has driven significant demand for supporting software and services. It has also created an opportunity to apply “knowledge discovery” technologies to extract information from documents, e-mail messages and other textual materials, and these analytical technologies have broad potential beyond litigation support in other business domains. In this interview, Aaron Brown, program director, Content Discovery and Search , IBM Information Management Software, shares his thoughts on the use of text analytics for e-discovery and in other emerging applications.

What’s your estimation of the legal-sector opportunity for knowledge-discovery technologies?

Aaron Brown

Legal discovery is at the front of the pack of promising new text-analytics applications along with a related cluster of use cases around compliance and legal control. Enterprises are looking to get out from under the crushing costs of traditional legal discovery and likewise to reduce the risks they face from compliance violations. Text analytics helps in both cases, in the first by helping legal users cut through the clutter quickly to find the key documents or custodians to focus on during case assessment and planning, reducing the need for expensive manual review, and in the second by helping compliance officers proactively detect patterns of non-compliant behavior — such as use of misleading language with customers or insider trading discussions — before they become major exposures both financially and in the public eye.

Yet awareness of text analytics in the legal sector is still very low, isn't it?

The technology may be just starting to make inroads, for instance with basic extraction and clustering showing up in tools for e-discovery review and early case assessment, but that's just the tip of the iceberg. Text analytics promises to change the fundamental economics of e-discovery – and to a similar extent, compliance-driven investigations – by transforming it from an exhaustive, human-centric process to a high-productivity collaboration between legal expert and analytics-driven discovery tools. [The transformed process] goes well beyond what you can do with traditional search and navigation to quickly highlight anomalies, expose unexpected patterns, etc.

Of course there are significant hurdles to overcome — notably establishing a track record in court for evidence discovered, retained, and/or prioritized by analytics-based technologies — but the great thing about the legal market is that the economics and pain points are such that this will happen, especially as more and more corporations start treating legal discovery as part of a proactive, in-house process managed by IT.

What’s the entry point for text analytics in the legal sector?

Legal analytics is one of the first mainstream applications of text analytics in the ‘traditional’ enterprise content management (ECM) space — the tip of the iceberg in enabling a broader, content-centric business-intelligence capability driven by the combination of enterprise content management with text analytics. So as you might guess, this is an area that's very interesting to me and very aligned with our text analytics focus at IBM.

Not all vendors see strong alignment of text analytics and BI. Some would instead place text analytics in the predictive-analytics category. The most prescient, if you ask me, are flexible in integration with established analytics practices but are also building the analytics into line-of-business applications.

Text analytics is not a one-trick pony. It can drive predictive analytics, it can help bring unstructured data into a traditional BI environment, it can power domain-specific solutions — like e-discovery, or the ever-popular sentiment-analysis solutions for media — and it can even drive its own content-centric alternative to traditional BI, when it's used to enable exploratory text mining to extract patterns and insight from business content.

The interesting question might be whether there's a one-size-fits-all analytics that can address all of these spaces. My view is that, today, the best traction will come in domains like legal where vertical customization is key. But I don't think this will be true long-term. I firmly believe that text analytics, framed appropriately — probably as a combination of repository-centric, operationalized extraction and business-user-centric exploratory interactive text mining and visualization — will ultimately become a horizontal capability much like traditional BI is evolving into. My crystal ball goes cloudy when I ask it to predict how long this will take, but we'll surely be doing everything we can to make it happen sooner rather than later.

So we have conceptual and semantic search, information extraction, clustering for term reduction and document processing, link and association analysis: many varieties of text-analytics and data-mining mojo. What's IBM's near-term approach to meeting legal-sector needs?

We're putting a substantial focus on adapting the technology to the domain, working closely with key clients and partners with very good visibility to and deep expertise in the ways the legal world works. We're also banking on the market shift that's already happening as more and more corporations, fed up with the cost of paying outside providers to handle e-discovery reactively, are start bringing e-discovery in-house. As they do this, e-discovery becomes a proactive solution purchased and operated as a joint venture between IT and Legal.

Successful technology solutions focus on end-user requirements. We technologists should remember this principle, which applies to every business domain including in legal sector. Your thoughts?

I’ve read your blog on this point… and you're absolutely right that domain-specific applications, workflows, and interfaces are essential to the infiltration of text analytics into the legal space. I've been spending a lot of my customer-facing time recently talking with legal officers at many of our large clients, and this is a message that comes through loud and clear. We've seen that they're less impressed with the latest whiz-bang visualizations and extraction technology, but when we demonstrate how we can embed that same analytics technology so it disappears into tools that accelerate or simplify their existing, deeply entrenched processes, they can't get enough of it.

What about other application domains? Beyond proven text-analytics successes in areas such as intelligence and life sciences, what are the most promising, emerging applications?

Following close behind compliance and discovery is a second cluster of applications around the topic of customer insight — applications that leverage text analytics to help companies reach a deeper understanding of their customers and the public at large to drive enhanced business value. This cluster starts with solutions that mine insights out of direct customer interactions with contact centers – the so-called "voice of the customer" – and extends to solutions that mine public discourse on the Web, and other sources internally and externally, to provide insight into public perception of products and services.

In the first set of solutions, text analysis brings the unstructured aspect of the customer dialogue, such as notes, e-mail, chats, voice transcription, or even the public conversation on the Web, together with traditional structured data already being collected, synthesizing the full view of the customer. We use the resulting insights to improve agent performance and compliance, to discover which types of interactions lead to better results, for early warning of quality issues, to find new opportunities for cross-sell/upsell or for developing new products and services, and for competitive intelligence. In the second set of solutions, text analytics provides insight into the ongoing conversation taking place on the Web (and increasingly within the company), helping detect emerging trends and patterns in the tone of conversation, helping highlight product quality issues early on, and exposing opportunities to improve marketing and future product capabilities.

And you cited ECM earlier...

We're seeing a new cluster of text-analytics applications starting to emerge around more traditional content management use cases. These are very exciting as they are leading indicators that text analytics is expanding beyond its traditional reaches into a market that's rich with content but sparse on analysis and insight today.

Enterprise are starting to see text analytics as a something they can apply as a horizontal capability to better understand the volumes of content that they've been storing and managing over the years. That might mean analyzing customer correspondence files to identify patterns linking customer satisfaction to buying behaviors, analyzing contracts or financial instruments to extract unusual elements and kick off business processes to mitigate them appropriately, mining insurance claim materials to identify patterns that indicate fraudulent behavior, or any other use case that revolves around the wealth of insight buried in archived content. It's going to take some time for these applications to mature, but over the long term I definitely see this as a substantial growth area for text analytics.

About the Author

Seth Grimes

Contributor

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium. Follow him on Twitter at @sethgrimes

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights