Are you about to choose a big data platform? Remember that SQL is not the best use of Hadoop.
Our recent 16 Top Big Data Analytics Platforms collection has generated lots of interest and plenty of comments and questions. To respond to the latter, we jumped at the chance to do a Google+ Hangout with the editors of sister UBM website AllAnalytics so we could go deeper on the topic.
The questions during the discussion (see video interview below) covered many of the most frequently asked questions posted below our slide show -- 21 comments and counting at this writing. How do you define big data analytics platforms or "big data" for that matter? What makes these "top" vendors, and are any of these platforms accessible to midmarket companies?
I elaborate on all these topics, but a few new questions from AllAnalytics editors Beth Schultz and Michael Steiner sparked good conversation and three points worth highlighting:
SQL won't make the most of Hadoop Many of these 16 top platform providers offer SQL-on-Hadoop options. You should keep in mind that SQL analysis against data on or from Hadoop does not offer the highest value you can gain from the platform. Organizations are embracing Hadoop to take advantage of data they couldn't afford to keep before and, more importantly, to capture complex and variable data -- clickstreams, log files, mobile data, social data, and more -- that aren't easily managed in relational database management systems (DBMSs).
You may be able to boil structured data out of a vast collection on Hadoop for SQL analysis. But the higher value is likely to be found with machine learning, time-series analysis, and other approaches that let you correlate this new data with the highly structured information you've been analyzing for years.
"We're seeing again and again, at almost every company that we work with, that the capabilities that BI and SQL give them are fine, but the types of data and the types of questions that they inevitably want to get to go far beyond that," noted Platfora CEO Ben Werther in a recent interview on this topic. "In the old world, you'd look at sales by store and so on, but in the new world you want to look at things like clickstream behavior and how it relates to physical store activity. [It's about] connecting the dots across the old traditional data sources and adding this new world of digital clicks, ads, and mobile, and social data."
Hadoop distributors aren't all eager promoters Data-management incumbents Oracle, IBM, and Teradata all sell and support Hadoop, but you get the impression they don't have their heart in helping you to make the most of the platform. Hadoop is on their product checklist because they know their customers are interested in it. However, I've talked to executives of all three companies who dismiss it as an immature, hard-to-manage platform that isn't nearly as capable as their incumbent databases.
The maturity and management points are accurate (and aren't a surprise, given Hadoop's life compared with 30-year-old RDBMSs). As far as its capabilities are concerned, the contrast with RDBMSs is not an apples-to-apples comparison. (See the point above about Hadoop's purpose and highest value not being SQL querying.) IBM gets this and has gone to the trouble of creating its own (InfoSphere BigInsights) Hadoop distribution. Teradata gets it, too, but puts forward Teradata Aster as a platform for MapReduce, Graph Analysis, and more. Yet how many platforms do you want to manage and maintain?
The bottom line is that all three of these biggies seem to accept Hadoop grudgingly as a high-scale, low-cost data lake, if the customer insists, but they want to channel the analysis activity into their own platforms and products. (I get a different, more enthusiastic feel about Hadoop from Microsoft, perhaps because it doesn't have nearly as many high-scale data warehousing customers as do Oracle, IBM, and Teradata, and could benefit from Hadoop market disruption.) If you like the idea of getting everything from one vendor, that's fine, but keep your eyes wide open about which vendors are eager to help you use Hadoop as more than a storage platform.
Analytics: the real prize Perhaps the most important point made during our discussion was that all 16 of these platform vendors realize that managing data isn't enough. That's why DBMS vendors are packing on the in-database analytics capabilities. It's why giants like IBM, Oracle, and SAP have acquired numerous analytics vendors. Yes, they still make tons of money on database licenses, ETL, and so on. But customers aren't likely to be attracted to platforms unless they can help them make sense of the data in order to get to predictive and prescriptive analytics.
I also make the point that companies don't live by analytics alone. They need to cover the basics of BI, operational reporting, and other needs. That's why we distinguished between platform providers -- those offering open, multi-purpose environments -- and dedicated analytics vendors. That line is starting to blur as companies like SAS, Alpine Data Labs, and others support memory-intensive clustered-server environments and Hadoop. Are these vendors prepared to support everything else you need to do with a data platform?
These key points and questions are all worth considering as you explore the options laid out in the 16 Top Big Data Analytics Platforms collection. We hope it's a helpful guide that will lead to fruitful technology choices.
Engage with Oracle president Mark Hurd, Box founder Aaron Levie, UPMC CIO Dan Drawbaugh, GE Power CIO Jim Fowler, former Netflix cloud architect Adrian Cockcroft, and other leaders of the Digital Business movement at the InformationWeek Conference and Elite 100 Awards Ceremony, to be held in conjunction with Interop in Las Vegas, March 31 to April 1, 2014. See the full agenda here.
Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.