There's storing big data and reporting against big data, and then there's gaining insights from big data with advanced analytics. The third level of maturity delivers the most value, and it's what Datameer is after with Datameer 3.0, announced Monday and set for general release this fall.
Datameer is a data-integration, data-management and self-service analytics platform that runs on top of Hadoop, and it's used by notable customers including Sears Holdings and Cardinal Health to bring together and analyze high-scale structured and unstructured data sets on Hadoop. The options for analysis have heretofore included a spreadsheet-style interface and a short list of data visualizations and packaged analytics.
Datameer 3.0 introduces four powerful options for advanced analytics: clustering, column-dependencies, decision trees and recommendation. What these four have in common is that they are machine-learning analyses driven by algorithms, and the data tells the analyst what's important.
[ Want more on Datameer in action? Read Why Sears Is Going All-In On Hadoop. ]
"With functional analytics, you as human being have to decide what you're going to look for, filter and analyze," Stefan Groschupf, CEO of Datameer, told InformationWeek. "As you integrate more diverse data and the larger the data sets become, the more you need machine learning to help you figure out what's important."
The four styles of analysis were chosen for their popularity. Clustering is used to find groups in data, as in segments of important customers. Column-dependency analysis uncovers important relationships among dimensions of data, such as age, income, location and product purchases, for example. Decision trees can be used to track conversion rates, for example, among different segments of customers in a sales funnel. And predictive recommendations are familiar to anyone who has seen Netflix movie recommendations or Amazon product-purchase suggestions.
Datameer calls the four new analysis options Smart Analytics because they don't require the complex data-preparation, sampling and scoring procedures associated with advanced analytics, according to Groschupf. With Datameer 3.0, users drag and drop data-set descriptions from a list of everything available on the Hadoop cluster. Preview analyses give users a sense of what they'll discover before the complete analysis is executed at scale behind the scenes. Datameer's software handles all the complexities of MapReduce processing without coding required by end users, according to Groshupf.
"One of our beta customers that was spending $1 million per month on Google Ad words used these analyses and found that they could cut that spend to $400,000 per month by focusing on the key words that were shown to be most likely to convert," Groshupf said.
The packaged functional analytics already available from Datameer include analyses such as Salesforce.com data in combination with Google Ad Words, Marketo leads, Web analytics or sentiment analysis against Twitter. More than 90 such packaged, template applications are available from Datameer's app store, with many having been developed by partners.
Datameer competes with Hadapt, Karmasphere, Platfora and other startups that offer business intelligence and analytics platforms designed to run on top of Hadoop. Groschupf said he isn't too worried about Cloudera Impala and other SQL-on-Hadoop options, such as Hortonworks Stinger, MapR-promoted Apache Drill or IBM Big SQL, because the universe of SQL-savvy professionals is in the low hundreds of thousands. Datameer's Smart Analytics, packaged analytics and spreadsheet tools, in contrast, are designed to be used by business analysts, he said.
"We're focused on the millions of business users who want easy-to-use tools and who don't want to have to wait for IT to help them make sense of that information," he said.
To understand how to secure big data, you have to understand what it is -- and what it isn't. In the Security Implications Of Big Data Strategies report, we show you how to alter your security strategy to accommodate big data -- and when not to. (Free registration required.)