Forrester analyst Boris Evelson asked on Twitter, "Does anyone have a good example of a query that SQL can't do? #BI"
Fact is, 30+ years in, there are few queries that you can't implement with SQL, even if more-complex queries will rely on user-defined functions (UDFs), procedural extensions, or convoluted coding. You can do just about anything information-related in C/C++, Java, Python, or other routines surfaced as SQL UDFs, but you're still limited to SQL-accessible sources. So here's my Twitter reply to Boris, which I'll go on to explain: "'What are the 39 Steps?' -- by which example I mean, knowledge-seeking queries."My response wasn't entirely fair, but I don't like artificial boundaries. The unstated assumption, possibly implicit for Boris, that to do BI you need a relational (or functionally equivalent) database, is legitimate but artificial. It limits the scope of BI inquiry and puts technical implementation ahead of business need, creating artificial constraints. There are useful non-database BI sources, so given that "SQL is a standardized language for defining and manipulating data in a relational database," can't-do queries will be found outside the RDBMS realm.
Time to interject that, so far as I can tell, the majority of queries that are best implemented with SPARQL and XQuery, a query language for RDF (Resource Description Framework) and one for generalized XML data structures, could be implemented in SQL, albeit often very awkwardly. Same for NoSQL queries of RDF, XML, key-value, and other non-relational data stores. SQL is that bendable and extendable.
Not surprisingly, Boris asked me to elaborate on my response.
By knowledge-seeking queries, I mean questions that may involve non-numerical answers, non-database sources, and natural-language processing (NLP).
What are the 39 Steps?
"What are the 39 Steps?" is a direct quotation from a movie. The answer could be pulled directly from a variety of Web sources by simply matching that phrase and extracting. (When I write "simply," I mean that the task is conceptually simple. I recognize that the implementation isn't necessarily.) There's no NLP involved in understanding the query or finding the answer although there is a) the recognition that my query is a quotation in search of a quotation response and b) a need to scour to non-DBMS sources. "The 39 Steps is an organization of spies collecting information on behalf of the foreign office of..." Noting the "are" (not the singular "is") in my query, there is also c) a need to avoid over-parsing the query. The correct answer is not "a novel by John Buchan" or "an Alfred Hitchcock movie."
Admittedly, all this 39-Steps business is abstract. My query doesn't relate to a business question, so instead consider, "What were sales of widget X last month?" and "Have sales of widget X increased in the last year?" These are natural-language queries against structured data stores that could be translated into SQL. Products from companies such as EasyAsk, which Doug Henschen reviewed in 2008, are designed to handle them. Valentin Tablan recorded an instructive 2008 lecture on technical approaches with GATE, "A Natural Language Query Interface to Structured Information": "It's about information access." But what if sources aren't "structured"?
BI queries beyond SQL
Let's take this BI query line beyond SQL. "What was responsible for the year-on-year change in sales of widget X?" This is a knowledge-seeking BI query that can not be implemented in SQL. We're interested not only in numerical facts; we're (also) interested in root causes.
There's a significant semantic component involved in processing a BI query of this nature. The machine has to understand what the user's seeking to know, the machine has to parse and evaluate candidate answer sources, and the machine has to assemble and present a potentially complex answer. To be frank, there isn't any system that I know of that can handle my example of a BI knowledge-seeking query. Answering questions such as this one involves an iterative, exploratory process.
Further, answers to knowledge-seeking queries are rarely as simple as pulling a quotation from a single source. Creating them may involve a complex assessment and assembly task, perhaps (again) involving semantic data integration. I see enterprise BI mash-ups as delivered by systems such as JackBe's Presto, where integration is notated in EMML (Enterprise Mashup Markup Language), and the Wolfram Alpha "computational knowledge engine," which is driven by the Mathematica language, as good examples of beyond-SQL approaches to delivering appropriate, if composite, answers to a certain class of BI query.
The right direction
Work like Valentin Tablan's that involves constructing and querying an ontology, a knowledge base, goes in the right direction. So do efforts to create query and inference engines for the (as-yet) unrealized Linked Data web and other tools to integrate diverse sources, whether via Semantic Web queries, at a future time when the Semantic Web has become real, or here-and-now BI mash-ups and knowledge-base queries. These are beyond-SQL initiatives. When it comes to knowledge and complexity, as opposed to simple, quantifiable facts, business intelligence will have to go beyond SQL."Does anyone have a good example of a query that SQL can't do?" Fact is, 30+ years in, there are few queries that you can't implement with SQL. More-complex queries may rely on user-defined functions (UDFs), procedural extensions, or convoluted coding but may still fall short. So here's my beyond-SQL example -- "What are the 39 Steps?" -- by which I mean, knowledge-seeking queries.