Datameer Bets Visual Analysis Beats SQL On Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Software Platforms
News
4/28/2014
09:06 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Datameer Bets Visual Analysis Beats SQL On Hadoop

Datameer 4.0 upgrade advances visual data preparation and analysis on top of Hadoop, offering an easier alternative to SQL.

16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)

With all the hubbub about SQL-on-Hadoop options over the last year, what seems to be lost on many is the fact that SQL access won't exactly open up big data to the masses. If the arcane languages of MapReduce, Pig, and so on are akin to ancient Latin, adding SQL to the language list is a bit like suggesting the use of modern Greek.

Enter Datameer, which offers a spreadsheet-on-Hadoop approach to exploring and analyzing data on Hadoop. Datameer has been among the pioneers in supporting ad-hoc analysis, reporting, and analytics directly on data in Hadoop, and with a 4.0 version released last week, it has enhanced the product's visual data preparation and data analysis capabilities.

Research by InformationWeek and plenty of other organizations has long confirmed that data quality is one of the biggest obstacles to developing valid and valuable BI and analytics. That challenge hasn't gone away in the big data era, and that's the key reason Datameer 4.0 includes a new Visual Data Profiling feature. Toggling from a spreadsheet view to the "flip side" Visual Data Profiling view, you see details on data type, count, max, min, uniqueness, mean, and average.

[Want more on creating big data applications? Read Hortonworks Adds Cascading For Big Data App Development.]

The data profile also reveals data quality problems, such as having more than two categories for customer gender or having a great big gap in the age range of customers. This is where Datameer's data correction, filtering, transformation, and enrichment capabilities come into play. Once you've addressed data quality problems, you toggle between the spreadsheet and data-profile views to visually inspect and validate that you're working with reliable data.

The second big advance in Datameer 4.0 is a new visual view of Datameer's clustering, column-dependency, decision tree, and recommendation algorithms. Once again, you toggle from the spreadsheet view to a flip-side visual analysis that helps business analysts understand how and why the algorithms produced their results.

The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.
The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.

Instead of just seeing that there are 12,500 customers in cluster A and 28,200 in cluster B, for example, the visual view might reveal that cluster A is made up of 20- to 30-year-old customers who spent an average of $48 last month, whereas cluster B is made up of 40- to 50-year-olds who spent an average of $96. You could get to these details in Datameer 3.0, but they weren't immediately exposed in a visual view.

Many traditional BI tools can now peer into Hadoop by way of SQL-on-Hadoop interfaces and connectors, but Datameer's spreadsheet-on-Hadoop approach is much more user friendly, even for those who know SQL, according to Karen Hsu, senior director or product marketing at Datameer.

"IT types might use SQL to segment data, but then they'll jump out and use Datameer for grouping, joining, partitioning, unioning data, and identifying outliers," Hsu said. "You can do all that with SQL, but it's just much easier to do in a spreadsheet."

SQL-on-Hadoop tools will certainly have their place. But given that SQL was a nonstarter in democratizing small data analysis, the emphasis on Hive, Impala, and other SQL-driven tools seems like a half step toward what organizations really want from big data. With its 4.0 release, Datameer is banking on spreadsheets and visualizations as the way to give business analysts a hands-on way to conduct data analysis on top of Hadoop.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
anon0079032070
50%
50%
anon0079032070,
User Rank: Apprentice
4/29/2014 | 12:17:25 PM
Datameer 4.0 is available for download
Thanks for the piece. I checked their site (http://www.datameer.com) and I believe they release version 4.0 for free trial. It is as you said and was fast! 
Slideshows
IT Careers: 12 Job Skills in Demand for 2020
Cynthia Harvey, Freelance Journalist, InformationWeek,  10/1/2019
Commentary
Enterprise Guide to Multi-Cloud Adoption
Cathleen Gagne, Managing Editor, InformationWeek,  9/27/2019
Commentary
5 Ways CIOs Can Better Compete to Recruit Top Tech Talent
Guest Commentary, Guest Commentary,  10/2/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll