Organizations have been using natural language processing (NLP) for text analytics to identify patterns in data such as social media sentiment and contract review, but NLP usage has been expanding.
“The big change that's happened in the last five years is the amount of context and understanding that can be extracted or used when understanding documents,” said Nigel Duffy, global artificial intelligence leader at EY. “Our ability to understand information from documents is much, much greater than it was a few years ago.”
BI and analytics vendors are adding NLP capabilities to their products such as natural language generation for data visualization narration and natural language understanding for natural language searches. In doing all of this, they're making data visualizations easier to understand and their products easier to use.
For example, Tableau, Sisense, and Qlik have all partnered with Narrative Science to narrate data visualizations with text. Data visualization narration advances the storytelling aspect of BI and data analytics while reducing the likelihood of subjective interpretation. In essence, it facilitates a common collective understanding of a data visualization. In the absence of that, people may disagree about what a chart or graph means.
NLP isn’t perfect
Vendors are proactively warning customers about the limitations of their NLP capabilities, which is wise because NLP isn't perfect. Because NLP involves natural human language, users have high expectations despite their everyday experiences with Siri, Google Assistant and Alexa. While Alexa's mistakes may be amusing sometimes, mistakes in a BI and analytics context are less amusing when the results are being relied upon to make better business decisions. Therefore, implementers should understand the limitations of the platform(s) they're using and set user expectations accordingly.
However, there are other things to consider. For example, with natural language search capabilities, users don’t have to understand SQL or Boolean search, so the act of searching is easier. However, the answer may be flawed, and if it is, the average employee may not realize it. Alternatively, users may not know which questions to ask in the first place.
"The interface is still relying on the user to ask the right question. There's no value judgment from these systems. It's simply you ask me a question and I'll give you the answer," said Steven Mills, associate director of machine learning and AI at Boston Consulting Group’s Federal Division (BCG Fed). "The power of the data science field is not simply the technical expertise, but an understanding of how to do analysis, how to interrogate the data to get the kinds of business insights we really need. We tend to forget about that second piece of the equation."
Depending on the vendor, the user may have to select the data source to query first, or not. The “or not” part is the part that’s going to stick over the long term because end users don’t care about where data resides. They just want to access it.
The same was true of the Internet. At one point, one had to know where a document resided to access it. The World Wide Web changed all that with decision trees (navigating through choices which meant stepping through pages), Boolean searches and finally natural language searches.
“There’s a real question of how good is good enough, because machine learning is statistical, and so it’s going to make mistakes,” said EY’s Duffy. “If the answer is it needs to be perfect, then you probably shouldn't take it on right now."
NLP has been around for several decades. In that time, the approaches to it have evolved from rule-based approaches to statistical approaches, though various techniques are often combined to achieve a particular result. Rule-based systems are only as good as the encoded rules. Statistical systems have a margin of error, so they're known to be imperfect. In fact, no system is perfect.
It's long been understood that natural language understanding necessarily involves context, including the relationship of words and the context in which they were used. Deep learning systems are enabling more nuanced types of understanding than were possible using traditional systems.
Right now, the BI and analytics offerings with NLP capabilities are pretty impressive, but most certainly not as impressive as they will be. More BI and analytics vendors will introduce NLP capabilities to stay competitive, which will put pressure on all vendors to innovate, alone and with the help of partners.
Gartner expects the use of graph technologies to grow, but did not assign growth numbers to analytics specifically in its Top 10 2019 Data & Analytics Trends. It did say graph databases would increase 100% annually through 2022.
The value of knowledge graphs is showing relationships. Imagine how enlightening it would be to discover word and phrase relationships across all kinds of structured and unstructured data to find similarities, differences and correlations that are relevant in the context of a particular problem.
For now, NLP searches include type-ahead capabilities, like Google, so users can get the answers faster. Although, the answers may not always be what the user hoped to get.
The next step beyond text will be voice interfaces, which will become increasingly interactive over time. While the facility will probably result in users attempting to "interrogate" data more than they have done it historically using a keyboard, it still won't mean that end users are able to think like data scientists. Nevertheless, users will appreciate the convenience of voice-enabled search and voice-narrated reports and dashboards.
A few BI and analytics vendors are offering NLP capabilities but they're in the minority for now. More will likely enter the market soon to stay competitive. In the meantime, those implementing BI and analytics solutions with NLP capabilities should set user expectations realistically, so they know what to expect.
Generally speaking, BI and analytics tools are becoming even easier to use, which helps democratize data analysis. The next step is implementing voice interfaces and democratizing complex analyses of complex data.
For more about natural language processing and its uses check out these articles: