Datameer Bets Visual Analysis Beats SQL On Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Software Platforms
09:06 AM
Connect Directly

Datameer Bets Visual Analysis Beats SQL On Hadoop

Datameer 4.0 upgrade advances visual data preparation and analysis on top of Hadoop, offering an easier alternative to SQL.

16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)

With all the hubbub about SQL-on-Hadoop options over the last year, what seems to be lost on many is the fact that SQL access won't exactly open up big data to the masses. If the arcane languages of MapReduce, Pig, and so on are akin to ancient Latin, adding SQL to the language list is a bit like suggesting the use of modern Greek.

Enter Datameer, which offers a spreadsheet-on-Hadoop approach to exploring and analyzing data on Hadoop. Datameer has been among the pioneers in supporting ad-hoc analysis, reporting, and analytics directly on data in Hadoop, and with a 4.0 version released last week, it has enhanced the product's visual data preparation and data analysis capabilities.

Research by InformationWeek and plenty of other organizations has long confirmed that data quality is one of the biggest obstacles to developing valid and valuable BI and analytics. That challenge hasn't gone away in the big data era, and that's the key reason Datameer 4.0 includes a new Visual Data Profiling feature. Toggling from a spreadsheet view to the "flip side" Visual Data Profiling view, you see details on data type, count, max, min, uniqueness, mean, and average.

[Want more on creating big data applications? Read Hortonworks Adds Cascading For Big Data App Development.]

The data profile also reveals data quality problems, such as having more than two categories for customer gender or having a great big gap in the age range of customers. This is where Datameer's data correction, filtering, transformation, and enrichment capabilities come into play. Once you've addressed data quality problems, you toggle between the spreadsheet and data-profile views to visually inspect and validate that you're working with reliable data.

The second big advance in Datameer 4.0 is a new visual view of Datameer's clustering, column-dependency, decision tree, and recommendation algorithms. Once again, you toggle from the spreadsheet view to a flip-side visual analysis that helps business analysts understand how and why the algorithms produced their results.

The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.
The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.

Instead of just seeing that there are 12,500 customers in cluster A and 28,200 in cluster B, for example, the visual view might reveal that cluster A is made up of 20- to 30-year-old customers who spent an average of $48 last month, whereas cluster B is made up of 40- to 50-year-olds who spent an average of $96. You could get to these details in Datameer 3.0, but they weren't immediately exposed in a visual view.

Many traditional BI tools can now peer into Hadoop by way of SQL-on-Hadoop interfaces and connectors, but Datameer's spreadsheet-on-Hadoop approach is much more user friendly, even for those who know SQL, according to Karen Hsu, senior director or product marketing at Datameer.

"IT types might use SQL to segment data, but then they'll jump out and use Datameer for grouping, joining, partitioning, unioning data, and identifying outliers," Hsu said. "You can do all that with SQL, but it's just much easier to do in a spreadsheet."

SQL-on-Hadoop tools will certainly have their place. But given that SQL was a nonstarter in democratizing small data analysis, the emphasis on Hive, Impala, and other SQL-driven tools seems like a half step toward what organizations really want from big data. With its 4.0 release, Datameer is banking on spreadsheets and visualizations as the way to give business analysts a hands-on way to conduct data analysis on top of Hadoop.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

New Storage Trends Promise to Help Enterprises Handle a Data Avalanche
John Edwards, Technology Journalist & Author,  4/1/2021
11 Things IT Professionals Wish They Knew Earlier in Their Careers
Lisa Morgan, Freelance Writer,  4/6/2021
How to Submit a Column to InformationWeek
InformationWeek Staff 4/9/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Successful Strategies for Digital Transformation
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll