Natural Language Query: Old Answer for 'New' BI Opportunity
As combinations of business intelligence and search technology are emerging, Progress Software is reviving a simpler approach: turning natural language queries into SQL code.
Why can't searching for answers from business intelligence systems be as simple as searching the Internet? Plenty of vendors have been trying to make it so, with leaders including Business Objects and Cognos from the BI camp and Endeca and FAST from the search camp, but it's hard to get search technology to interpret structured data. That may explain why the hit list of search-BI success stories is a short one to date.
But what if you could turn natural-language queries into good old-fashioned SQL queries? Well, that capability actually existed long before Google became a household word. In the mid 1990s, Software AG offered a product called Esperant that pioneered what's called Natural Language Query (NLQ). Microsoft also latched onto the approach, bundling a free "English Query" utility in SQL Server 2000. While these two tools seem to have drifted into obscurity, Progress Software is now reviving NLQ with EasyAsk , a product recently repositioned to bring the ease of Internet-style search to operational BI.
The timing of this latest NLQ offering could make a big difference, says Howard Dresner, former Gartner BI analyst and now head of Dresner Advisory Services. "Natural Language Query didn't do well 15 years ago because people didn't get it," says Dresner. "The market now gets what search is all about, so it's easier for them to understand something like EasyAsk."
EasyAsk is designed to combine both the SQL query and search approaches to find relevant information. "In response to typing in a question or a description of what you're interested in, EasyAsk does two things," explains Dr. Larry Harris, vice president and general manager of Progress EasyAsk. "First, we generate an ad hoc, SQL-based query, and second, we do a search on a repository of standard reports."
The generation of SQL is guided by dictionaries that are tuned to the underlying structure of available databases, including table names, column names and the ways in which data values are represented. The repository scan is a conventional document-oriented search. Both are important, says Harris, because well-designed reports are often a better source of answers for common questions than ad hoc queries. "A structured report might include additional data that wasn't explicitly mentioned in the question but that might be useful in the analysis," he says. "Reports also offer formatting that might better highlight key results." Until recently, EasyAsk was marketed primarily as an e-commerce tool, used by customers including HP, The Gap, Coach and Sony to power Web site searches tied to product databases rather than document repositories or collections of Web pages. Late last year the focus changed, and Progress started marketing the technology for operational BI. One of the earliest customers was Nedbank, which is the fourth largest bank in South Africa. The bank has piloted EasyAsk in a handful of applications and it's now on the verge of rolling out a major production project.
Nedbank's 60-employee Business Intelligence Solutions departments looks after a 40-terabyte data warehouse, and it regularly spins out data mart applications with Web-based GUIs that enable business users to run stored queries for data exploration. The challenge was that users were constantly coming up with new questions, but the apps are "only as flexible and as fast as a developer can move," says Derick Oliver, data warehouse delivery manager.
Nedbank saw EasyAsk as a possible way to enable users untrained in SQL to do ad hoc query without IT support, so last fall it piloted two applications. In the first app, Nedbank put EasyAsk on top of a 20 GB customer database so bank users could explore customer profiles, spending patterns and other information used in marketing campaigns. "Within four days, we had a solution that could query the entire database quite seamlessly," says Oliver. "The knee-jerk reaction from the technical guys was, 'I didn't write the code, so I don't trust it,' but we inspected the SQL that EasyAsk generated to make sure that the queries were correct."
In a second project, which subsequently went into production, Nedbank developed a simple query application needed by the Homelands (mortgage) Department. Without any formal training on the product, developers simply followed the installation procedures to customize the query dictionary and created and store the desired queries (in English), and the project was completed within one day, says Oliver. "Once we proved that it was generating valid SQL and that the queries were coming out right, we started letting users loose in a controlled fashion," he says. "The users quickly figured out that the tool is quite easy to use, so they started typing in their own questions." The application has since been in use by ten employees for nearly six months, and Oliver says that in only a handful of occasions has his department had to tune the dictionary to ensure that new English-language queries were turning up valid results.
EasyAsk now faces its biggest test at Nedbank, as it has been embedded in an existing application used my more than 9,000 bank branch employees and managers to benchmark which employees, products and branches are doing well in terms of sales and service and which ones are falling short. "There's a growing amount of data sitting under that application, and we simply can't develop fast enough to satisfy the need for queries," says Oliver. "The EasyAsk functionality has been tested and looks good, but we're trying to test workload and servers and make sure it won't kill performance." Oliver notes that the software can be tuned to limit the size of the queries at certain times of the day, so he's expecting the application to roll out to all branches in mid April.
Harris of Progress says EasyAsk "stands on the shoulders" of existing data warehouse, data mart and information management infrastructures, providing a broadly available, Web-based query environment for with per-CPU, rather than per-seat, licensing. Typical deployments roll out to hundreds or thousands of users with costs in the $200,000 to $300,000 range, says Harris.
NLQ technology does have its limits, says Dresner, so he predicts it will shine in focused applications. "The query environment is only as good as the dictionary, but Progress seems to have put some effort into developing those dictionaries," Dresner explains. "I like that fact that they've come up with functional solutions for areas such as sales and marketing so they can build all those concepts into the dictionary out of the box."
About the Author
You May Also Like