Big Data // Software Platforms
News
4/28/2014
09:06 AM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Datameer Bets Visual Analysis Beats SQL On Hadoop

Datameer 4.0 upgrade advances visual data preparation and analysis on top of Hadoop, offering an easier alternative to SQL.

16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)

With all the hubbub about SQL-on-Hadoop options over the last year, what seems to be lost on many is the fact that SQL access won't exactly open up big data to the masses. If the arcane languages of MapReduce, Pig, and so on are akin to ancient Latin, adding SQL to the language list is a bit like suggesting the use of modern Greek.

Enter Datameer, which offers a spreadsheet-on-Hadoop approach to exploring and analyzing data on Hadoop. Datameer has been among the pioneers in supporting ad-hoc analysis, reporting, and analytics directly on data in Hadoop, and with a 4.0 version released last week, it has enhanced the product's visual data preparation and data analysis capabilities.

Research by InformationWeek and plenty of other organizations has long confirmed that data quality is one of the biggest obstacles to developing valid and valuable BI and analytics. That challenge hasn't gone away in the big data era, and that's the key reason Datameer 4.0 includes a new Visual Data Profiling feature. Toggling from a spreadsheet view to the "flip side" Visual Data Profiling view, you see details on data type, count, max, min, uniqueness, mean, and average.

[Want more on creating big data applications? Read Hortonworks Adds Cascading For Big Data App Development.]

The data profile also reveals data quality problems, such as having more than two categories for customer gender or having a great big gap in the age range of customers. This is where Datameer's data correction, filtering, transformation, and enrichment capabilities come into play. Once you've addressed data quality problems, you toggle between the spreadsheet and data-profile views to visually inspect and validate that you're working with reliable data.

The second big advance in Datameer 4.0 is a new visual view of Datameer's clustering, column-dependency, decision tree, and recommendation algorithms. Once again, you toggle from the spreadsheet view to a flip-side visual analysis that helps business analysts understand how and why the algorithms produced their results.

The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.
The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.

Instead of just seeing that there are 12,500 customers in cluster A and 28,200 in cluster B, for example, the visual view might reveal that cluster A is made up of 20- to 30-year-old customers who spent an average of $48 last month, whereas cluster B is made up of 40- to 50-year-olds who spent an average of $96. You could get to these details in Datameer 3.0, but they weren't immediately exposed in a visual view.

Many traditional BI tools can now peer into Hadoop by way of SQL-on-Hadoop interfaces and connectors, but Datameer's spreadsheet-on-Hadoop approach is much more user friendly, even for those who know SQL, according to Karen Hsu, senior director or product marketing at Datameer.

"IT types might use SQL to segment data, but then they'll jump out and use Datameer for grouping, joining, partitioning, unioning data, and identifying outliers," Hsu said. "You can do all that with SQL, but it's just much easier to do in a spreadsheet."

SQL-on-Hadoop tools will certainly have their place. But given that SQL was a nonstarter in democratizing small data analysis, the emphasis on Hive, Impala, and other SQL-driven tools seems like a half step toward what organizations really want from big data. With its 4.0 release, Datameer is banking on spreadsheets and visualizations as the way to give business analysts a hands-on way to conduct data analysis on top of Hadoop.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
anon0079032070
50%
50%
anon0079032070,
User Rank: Apprentice
4/29/2014 | 12:17:25 PM
Datameer 4.0 is available for download
Thanks for the piece. I checked their site (http://www.datameer.com) and I believe they release version 4.0 for free trial. It is as you said and was fast! 
In A Fever For Big Data
In A Fever For Big Data
Healthcare orgs are relentlessly accumulating data, and a growing array of tools are becoming available to manage it.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.