Big Data // Big Data Analytics
Commentary
2/18/2014
10:14 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Big Data Platform Comparisons: 3 Key Points

Are you about to choose a big data platform? Remember that SQL is not the best use of Hadoop.

Our recent 16 Top Big Data Analytics Platforms collection has generated lots of interest and plenty of comments and questions. To respond to the latter, we jumped at the chance to do a Google+ Hangout with the editors of sister UBM website AllAnalytics so we could go deeper on the topic.

The questions during the discussion (see video interview below) covered many of the most frequently asked questions posted below our slide show -- 21 comments and counting at this writing. How do you define big data analytics platforms or "big data" for that matter? What makes these "top" vendors, and are any of these platforms accessible to midmarket companies?

[ Watch InformationWeek's Doug Henschen discuss 16 Top Big Data Analytics Platforms with the editors of AllAnalytics (below). ]

I elaborate on all these topics, but a few new questions from AllAnalytics editors Beth Schultz and Michael Steiner sparked good conversation and three points worth highlighting:

SQL won't make the most of Hadoop
Many of these 16 top platform providers offer SQL-on-Hadoop options. You should keep in mind that SQL analysis against data on or from Hadoop does not offer the highest value you can gain from the platform. Organizations are embracing Hadoop to take advantage of data they couldn't afford to keep before and, more importantly, to capture complex and variable data -- clickstreams, log files, mobile data, social data, and more -- that aren't easily managed in relational database management systems (DBMSs).

You may be able to boil structured data out of a vast collection on Hadoop for SQL analysis. But the higher value is likely to be found with machine learning, time-series analysis, and other approaches that let you correlate this new data with the highly structured information you've been analyzing for years.

"We're seeing again and again, at almost every company that we work with, that the capabilities that BI and SQL give them are fine, but the types of data and the types of questions that they inevitably want to get to go far beyond that," noted Platfora CEO Ben Werther in a recent interview on this topic. "In the old world, you'd look at sales by store and so on, but in the new world you want to look at things like clickstream behavior and how it relates to physical store activity. [It's about] connecting the dots across the old traditional data sources and adding this new world of digital clicks, ads, and mobile, and social data."

Hadoop distributors aren't all eager promoters
Data-management incumbents Oracle, IBM, and Teradata all sell and support Hadoop, but you get the impression they don't have their heart in helping you to make the most of the platform. Hadoop is on their product checklist because they know their customers are interested in it. However, I've talked to executives of all three companies who dismiss it as an immature, hard-to-manage platform that isn't nearly as capable as their incumbent databases.

The maturity and management points are accurate (and aren't a surprise, given Hadoop's life compared with 30-year-old RDBMSs). As far as its capabilities are concerned, the contrast with RDBMSs is not an apples-to-apples comparison. (See the point above about Hadoop's purpose and highest value not being SQL querying.) IBM gets this and has gone to the trouble of creating its own (InfoSphere BigInsights) Hadoop distribution. Teradata gets it, too, but puts forward Teradata Aster as a platform for MapReduce, Graph Analysis, and more. Yet how many platforms do you want to manage and maintain?

The bottom line is that all three of these biggies seem to accept Hadoop grudgingly as a high-scale, low-cost data lake, if the customer insists, but they want to channel the analysis activity into their own platforms and products. (I get a different, more enthusiastic feel about Hadoop from Microsoft, perhaps because it doesn't have nearly as many high-scale data warehousing customers as do Oracle, IBM, and Teradata, and could benefit from Hadoop market disruption.) If you like the idea of getting everything from one vendor, that's fine, but keep your eyes wide open about which vendors are eager to help you use Hadoop as more than a storage platform.

Analytics: the real prize
Perhaps the most important point made during our discussion was that all 16 of these platform vendors realize that managing data isn't enough. That's why DBMS vendors are packing on the in-database analytics capabilities. It's why giants like IBM, Oracle, and SAP have acquired numerous analytics vendors. Yes, they still make tons of money on database licenses, ETL, and so on. But customers aren't likely to be attracted to platforms unless they can help them make sense of the data in order to get to predictive and prescriptive analytics.

I also make the point that companies don't live by analytics alone. They need to cover the basics of BI, operational reporting, and other needs. That's why we distinguished between platform providers -- those offering open, multi-purpose environments -- and dedicated analytics vendors. That line is starting to blur as companies like SAS, Alpine Data Labs, and others support memory-intensive clustered-server environments and Hadoop. Are these vendors prepared to support everything else you need to do with a data platform?

These key points and questions are all worth considering as you explore the options laid out in the 16 Top Big Data Analytics Platforms collection. We hope it's a helpful guide that will lead to fruitful technology choices.

Engage with Oracle president Mark Hurd, Box founder Aaron Levie, UPMC CIO Dan Drawbaugh, GE Power CIO Jim Fowler, former Netflix cloud architect Adrian Cockcroft, and other leaders of the Digital Business movement at the InformationWeek Conference and Elite 100 Awards Ceremony, to be held in conjunction with Interop in Las Vegas, March 31 to April 1, 2014. See the full agenda here.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Stratustician
50%
50%
Stratustician,
User Rank: Ninja
2/22/2014 | 1:48:41 PM
The real motivation for SQL over Hadoop
I wonder if the reluctance to really push Hadoop is not just that the complexity that comes from learning a new tool, but rather that SQL has been ingrained in so many organizations that the change away from this model scares the vendors themselves.  This is after all, where a chunk of their revenue comes from, hosting and managing SQL databases.  If customers are starting to look at Hadoop and it's flexibility, they might look at newer startups which could steal market share away from the big guys.

Nice to see Microsoft supporting Hadoop, there really is a lot of benefits that come from leveraging a flexible platform, especially if it's properly married with the right BI tools.
Thomas Claburn
50%
50%
Thomas Claburn,
User Rank: Author
2/18/2014 | 7:04:27 PM
Re: Hadoop's best fit
I'd be curious to know whether using funds intended for big data investments can be directed to product quality improvements for equivalent or better results. In other words, how much data analysis does an online company really need?
Charlie Babcock
IW Pick
100%
0%
Charlie Babcock,
User Rank: Author
2/18/2014 | 3:20:49 PM
How to make the best use of Hadoop
There is so much useful information in server log files that does not lend itself to row and column storage that you know there's a great future for the more loosely structured Hadoop. We are still at the stage where we're trying to find the most useful data access method for Hadoop, sort of like the relational database era when it was still a debate whether SQL was best.
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
2/18/2014 | 3:03:20 PM
Disinterest or more
Doug, do you think it's smarter for companies really interested in Hadoop to hire a separate consultant to help them get the most from the platform? Maybe it'll cost more than working with an incumbent RDBMS provider, but you have to take into account that the motivation of a specialist is to make the very most of the Hadoop setup. Not so with the incumbent.
Laurianne
50%
50%
Laurianne,
User Rank: Author
2/18/2014 | 1:27:24 PM
Hadoop's best fit
"Organizations are embracing Hadoop to take advantage of data they couldn't afford to keep before and, more importantly, to capture complex and variable data -- clickstreams, log files, mobile data, social data, and more -- that aren't easily managed in relational database management systems (DBMSs)." Important point. Hopefully Doug gave you some ammo here on how to explain this when you're talking to higher-ups who may not understand this distinction with Hadoop and SQL. Other advice to share on this point, readers?
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek - September 2, 2014
Avoiding audits and vendor fines isn't enough. Take control of licensing to exact deeper software discounts and match purchasing to actual employee needs.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
In in-depth look at InformationWeek's top stories for the preceding week.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.