Without analytic methods, there would be no "360-degree views" of customers, partners or competitors. While decision-making is still largely based on well-established methods for exploring historical data, we're starting to hear more about successes gained with predictive analytics that turn the gaze on the future. These methods offer data classification, clustering and forecasting to help organizations apply knowledge to operational decision-making and planning. Significant barriers remain. A big problem is that the algorithms are generally abstruse, designed by and for statisticians. The results often defy lay explanation. (Do you know what a support vector machine [SVM] is? Do business managers dare rely on them when told that they "simplify predictive models by reducing dimensionality"?)
In practice, complexity and opaqueness can lead to misuse. And misuse leads to guidance that's worse than worthless if it prompts erroneous decision-making.
I polled SPSS, KXEN and SAS—vendors that many of us look to for analytic solutions—to get their takes on usability barriers. Charlie Chase, SAS market strategy manager for econometrics and forecasting, says he frequently encounters not only messy, inconsistent and otherwise deficient data but also—and more seriously—incorrect assumptions that lead to oversimplification and the choice of inadequate tools and models. "The 'people, process and technology' pieces need to be in balance," says Chase, alluding to the risk in indiscriminate over-reliance on technology.
"The algorithms work fine, but only in the hands of statistical experts," says KXEN marketing VP Joerg Rathenberg. The answer, Rathenberg says, is to "embed predictive capabilities into the workflow and enterprise software" used by nonstatisticians. Colin Shearer, SPSS vice president of customer analytics, says the key is to "package and preconfigure the analytics and present them in a user interface that supports business-level tasks."
These vendors are attempting to expand from customer relationship management (CRM), fraud detection and risk management into areas with greater data volumes and problem complexity—areas that include finance, banking, insurance, retail and telecommunications. These are environments in which data is diverse in origin and form and analytic needs go beyond scoring—the application of analytic models to evaluate individual cases or scenarios—to detection of patterns and linkages.
Gregory Piatetsky-Shapiro, creator of the KDNuggets.com knowledge-discovery reference site, offers the example of bioinformatics, where predictive analytics can detect links between biomarkers and diseases and potential therapies. Piatetsky-Shapiro reports advances in "dealing with complex data such as text, the Web and social networks that involve not just tables but relationships between entities," and also in dealing with images and multimedia. SPSS's Shearer similarly emphasizes the advantage of uniting "data in the broadest sense ... with 'conventional' customer data—descriptions, demographics and transactions—to give deeper insights and more accurate predictive models."
When I think about difficulties posed by complexity, deficient data, misapplied methods, misinterpreted results and lack of explanatory clarity, and when I note sometimes-glaring failures because of our inability to properly factor in human judgment and interpret and act on predictions, I wonder if we're really ready for predictive analytics to move out of the lab and into the hands of nonstatisticians.
The federal government spent billions of dollars to secure U.S. transportation and infrastructure, but in reality it has done little to protect against terrorism because plans and assumptions were never scientifically tested with predictive modeling or other techniques. Take US-VISIT, which would crunch diverse, distributed data to facilitate border control. The Washington Post reported on May 23 that "documents and interviews ... show that government officials are betting on speculative technology while neglecting basic procedures." Urgent calls to action have led to hasty, unsupported inferences that untested, simplistic tactics can produce desired goals.
When scaling predictive analytics out to large, distributed data sets and complex problems, will business, motivated by the politics of profit rather than of power, do any better?
Seth Grimes is a principal at Alta Plana Corp., a Washington, D.C.-based consultancy specializing in large-scale analytic computing systems. Write to him at [email protected]