Big Data Analytics: Time For New Tools - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
Commentary
12/18/2014
08:36 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
LinkedIn
Twitter
RSS
50%
50%

Big Data Analytics: Time For New Tools

So you're considering Hadoop as a big data platform. You'll probably need some new analytics and business intelligence tools if you're going to wring fresh insights out of your data.

Hadoop is steadily gaining adoption as an enterprise platform for capturing high-scale and highly variable data that's not easy or economically viable to store in relational databases. What's less clear is just how companies are going to analyze all this data.

A recent Forrester report declared that Hadoop is "no longer optional" for large enterprises. Our data suggests that train hasn't left the station just yet: Just 4% of companies use Hadoop extensively, while 18% say they use it on a limited basis, according to our just-released 2015 InformationWeek Analytics, Business Intelligence, and Information Management Survey. That is up from the 3% reporting extensive use and 12% reporting limited use of Hadoop in our survey last year. Another 20% plan to use Hadoop, though that still leaves 58% with no plans to use it.

But there's no doubt that interest in Hadoop is rising. The top draw is the platform's "ability to store and process semi-structured, unstructured, and variable data," cited by 31% of the 374 respondents to our survey involved with information management technology. Another 30% cited Hadoop's ability to handle "massive volumes of data," while 25% said it's Hadoop's "lower hardware and storage scaling costs" as compared to conventional relational database management systems.

That's the IT, data-management perspective on the need for Hadoop. But why is the business looking to capture and analyze big data in the first place? The top driver, cited by 48% of respondents using or planning to deploy data analytics, BI, or statistical analysis software, is finding correlations across multiple, disparate data sources, like Internet clickstreams, geospatial data, and customer-transaction data. Next in line are predicting customer behavior, cited by 46%, and predicting product or service sales, cited by 40% of respondents (multiple responses allowed, see chart below). Other motivations include predicting fraud and financial risks, analyzing social network comments for customer sentiment, and identifying security risks.

In each of these examples, companies are searching for insight by analyzing big data sets that they couldn't discover parsing the same old data they've long held in transactional systems alone. Capturing and analyzing clickstreams, server log files, social network streams, and geospatial data from mobile apps is a recent, big-data-era phenomenon for most organizations attempting it, and they're gaining insights and seeing correlations that just weren't available in the enterprise data warehouse.

But pulling insight out of this new data will require some new tools, ones that work alongside Hadoop -- which is, at its core, nothing more than a highly distributed file system. Here are the three categories of options associated with Hadoop, along with product examples.

Hadoop-native data-processing and analysis options: These include Apache Hive (provides SQL-like data access -- think data warehousing meets Hadoop); Apache Mahout (supports machine learning on top of Hadoop -- think finding patterns in data); Apache MapReduce (for searching, filtering, sorting, and forms of processing large data sets in Hadoop -- ways to boil down really big data to find the useful nuggets); and Apache Pig (a language for writing MapReduce jobs).

Alternative SQL access/analysis options: Hive is slow by relational database standards, and it doesn't support all SQL-analysis capabilities. These alternatives are designed to make BI professionals feel more at home, giving them accustomed performance, SQL- or SQL-like querying, and compatibility with current BI tools. Examples include Actian Analytics Platform SQL Hadoop Edition, Apache Drill, Cloudera Impala, HP Vertica For SQL on Hadoop, IBM Big SQL, Microsoft SQL Server Polybase, Oracle Big Data SQL, Pivotal HAWQ, and Teradata Query Grid.

Analytics and BI options designed to run on Hadoop: These tools blend SQL and BI-type querying with big-data-oriented and advanced analytics capabilities. Examples include Apache Spark, Apache Storm, Datameer, Platfora, and SAS Visual Analytics. Many of these analysis engines now run on Hadoop 2.0's YARN resource-management system.

The first thing to note is that the SQL and SQL-like options -- including Hive, Impala, Drill, the various relational databases ported to run on Hadoop (Actian, HP, Pivotal), and the various SQL-access options (Microsoft, Oracle, Teradata) -- give you the basics of SQL query and analysis, but these are not alternatives to analytics workbenches or business intelligence suites. As noted, a key point of these query and access tools is making Hadoop compatible with incumbent SQL-connected products like BusinessObjects, Cognos, MicroStrategy, OBIEE, Tableau Software, and so on.

Businesses are demanding compatibility with tools that they already have on hand. This helps explain why there were so many SQL-on-Hadoop announcements from both Hadoop vendors (Cloudera, Hortonworks, MapR) and database incumbents (Actian, Hewlett-Packard, Oracle) over the last year.

But companies need more than SQL. The value in big data analysis is often in finding correlations among disparate data sets or insights hidden in semi-structured or highly variable data sources, such as

Next Page

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
1 of 3
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
12/18/2014 | 9:14:33 AM
We need Analytics on Hadoop as much or more than SQL on Hadoop
SQL on Hadoop = training wheels for big data analysis. Tools supporting machine learning, advanced analytics, data visualization, etc. on top of Hadoop are what's needed to make sense of high-volume and highly variable new data types. Apache Spark, Datameer, Platfora, SAS Visual Analysis, Alpine, Revolution Analytics and others are among the emerging options. Even Oracle recognizes that SQL isn't enough. Oracle offers Oracle Big Data Discovery, which starts with machine learning for data exploration and leads to various big data visualization and analysis options.
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

News
Becoming a Self-Taught Cybersecurity Pro
Jessica Davis, Senior Editor, Enterprise Apps,  6/9/2021
News
Ancestry's DevOps Strategy to Control Its CI/CD Pipeline
Joao-Pierre S. Ruth, Senior Writer,  6/4/2021
Slideshows
IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Slideshows
Flash Poll