Cloudera Trash Talks With Enterprise Data Hub Release - InformationWeek
Data Management // Software Platforms
01:05 PM
Connect Directly
[Dark Reading] Cybersecurity Crash Course
Jan 11, 2018
Keeping up with the every changing world or cyber security can be exhausting. Dark Reading wants ...Read More>>

Cloudera Trash Talks With Enterprise Data Hub Release

Cloudera repackages its software and services to compete with IBM, Teradata, and Hadoop rivals Hortonworks and MapR.

Cloudera introduced new packaging and pricing setups for its Hadoop distributions on Monday, delivering a top-of-the-line Enterprise Data Hub offering that the company says will shake up competition in the data management market.

"Increasingly, our customers are not viewing the relevant comparison as Cloudera versus Hortonworks," Matt Brandwein, Cloudera's director of product marketing, told InformationWeek. "They're viewing it as Cloudera versus Hortonworks plus Teradata Aster, or, if you're talking to an IBM shop, Cloudera versus IBM BigInsights plus Netezza."

Preannounced last fall, Cloudera's Enterprise Data Hub offering includes its core open-source distribution, now called Cloudera Express; the Cloudera Manager management console; and all of the vendor's "premium" components: the HBase NoSQL database, Cloudera Impala (SQL-on-Hadoop query), Cloudera Search, Apache Spark (for in-memory and streaming analysis), and Cloudera Navigator to provide access control and auditing. Cloudera didn't spell out its new pricing, but it said customers now get everything with Enterprise Data Hub for about the same price it used to charge for its Enterprise Edition with two optional premium modules.

[Want more on big data? Read 16 Top Big Data Analytics Platforms.]

For customers with more focused needs, Cloudera introduced a Flex Edition that includes Cloudera Express, Cloudera Manager, one premium component (most likely HBase or Impala), and support. A new Basic Edition -- aimed at those just getting started with Hadoop -- includes Cloudera Express, Cloudera Manager, and support.

According to Cloudera, the new packaging better reflects the way customers are buying and using Hadoop, and it avoids back-and-forth contract negotiations as companies grow into the use of Hadoop. Brandwein said the most mature companies are putting Hadoop at the center of their data management plans.

"We have many, many customers that are substituting an enterprise data hub built on Hadoop for incremental purchases of a whole range of data management infrastructure, including relational databases, enterprise data warehouses, storage, and mainframes," Alan Saldich, Cloudera's vice president of marketing, told us.

Challenging Teradata specifically, Brandwein said Cloudera's Enterprise Data Hub offers a superior Hadoop distribution to that offered by Teradata partner Hortonworks. He also pitched Impala as an answer to Teradata Aster for data discovery and analytics. "Impala on top of core Hadoop gives you all the scalability, flexibility, and economic benefits of Hadoop, plus the ability to run interactive SQL queries, in-memory machine-learning algorithms, out-of-the-box analytic functions, plus search."

In a comparison to IBM's BigInsights Hadoop distribution and the Netezza massively parallel processing database management system, Brandwein said, "There's not a lot of reason why you couldn't port most of those workloads over to Cloudera" Enterprise Data Hub.

Cloudera said Enterprise Data Hub delivers enterprise-oriented features that Hadoop distribution-and-support competitors (such as Hortonworks and MapR) still lack. For example, Brandwein said Hortonworks capabilities for roll-based access controls and column- and row-level security, are still in the labs, and Hortonworks completely lacks search and data-governance features. As for SQL-on-Hadoop querying, "Hortonworks talks about Stinger as if it were shipping and as if it were fast," while Cloudera Impala trounces Hortonwork's currently shipping Hive software on query speed.

As for MapR, its search option is a separately licensed product, not native to the distribution, Brandwein said. Its security is "limited to HBase," and it has no role-based access controls.

InformationWeek contacted IBM, Teradata, Hortonworks, and MapR for comment on Cloudera's competitive statements, but by press time only Hortonworks had responded to our inquiry. "While Cloudera tries to differentiate with a proprietary approach, the open source community momentum and delivery of enterprise-relevant capabilities only speeds up further," said Shaun Connolly, vice president of corporate strategy at Hortonworks.

Connolly detailed some open-source components that he said counter each of Cloudera's assertions about what's lacking in its distribution. (You can read his full response -- along with other responses to come -- in the comments area below.)

Some might say that Cloudera is getting a little ahead of itself by seeing such a big role for Hadoop and seeing itself as a competitor to the likes of IBM and Teradata. Gartner analyst Merv Adrian, for example, calls Cloudera's Enterprise Data Hub strategy "aspirational." In a recent webinar (registration required), he revealed attendee survey data that shows that many practitioners are still struggling just to find value in Hadoop. Almost half of attendees, cited its lack of a clear value as its biggest barrier to adoption. Others complained about primitive integration with infrastructure and the lack of available talent to run Hadoop clusters and analyses.

Cloudera counters that about a third of its customers are ready for the Enterprise Data Hub strategy, while a third use Hadoop for very specific needs appropriate to the Flex offering, and a third are just getting started and will find the supported Basic offering most appropriate.

Cloudera's competitive claims will certainly get people talking, but it will likely take customer success stories -- and plenty of them -- to convince practitioners that they need Hadoop, let alone an Enterprise Data Hub.

Too many companies treat digital and mobile strategies as pet projects. Here are four ideas to shake up your company. Also in the Digital Disruption issue of InformationWeek: Six enduring truths about selecting enterprise software (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
2/10/2014 | 2:36:38 AM
Re: Response by Hortonworks
Mr Connolly doesn't address search on Hortonworks. I thought they had Elasticsearch as a partner for search?
User Rank: Apprentice
2/4/2014 | 6:38:59 PM
Re: Response by MapR

MapR Search can be used with M3, M5 or M7. There is an additional subscription cost for customers who want support. I may be wrong, but I thought that Cloudera Search also involved an additional subscription cost for customers who want support. I do agree with your comment that few customers want to run unsupported software in production.

With regard to security:
  • Authentication - MapR is the only distribution that supports Kerberos and also enables customers that don't have a Kerberos infrastructure to enjoy strong wire-level authentication and encryption.
  • Authorization - MapR provides fine-grained ACLs on files, directories, tables, column families, columns, queues, jobs, volumes and management capabilities. You mentioned Apache Sentry, which will soon be part of the MapR distribution, and customers can actually use it today (last I checked, it was an Apache project, not a Cloudera differentiator). But you should keep in mind that Sentry is only relevant if the only thing that the users are doing is Hive/Impala queries. In other words, if the users are doing MapReduce, Pig, Cascading, Spark, direct file access (eg, hadoop fs -cat), etc., all the permissions in Sentry are irrelevant! Our customers view the enterprise data hub as more than just SQL queries - they want to run a variety of workloads on the cluster, not just SQL queries. To address this gap (which I'm sure you'll agree is huge), MapR has built RBAC into the underlying data so that it applies across all processing frameworks and methods of access. That's a huge advantage - and an absolute requirement for customers that really care about security.

Bottom line - MapR has significant advantages over Cloudera in both authentication and authroization, no matter how you look at it.

With regard to data protection and disaster recovery, I think it's disingenuous to claim that Cloudera has that. I understand that you want to gloss over any details here, but let's put a few of them on the table. First, Cloudera introduced the world's first inconsistent snapshots (HDFS and HBase snapshots). Believe it or not, data can actually change inside the HDFS snapshot! Basically, this renders the feature useless for most use cases other than as a marketing checkbox. Second, Cloudera's DR is a GUI on top of distcp, a MapReduce job that copies files - each file is copied at a different time (no consistency), it's not done at the block level, etc. In the storage and database worlds DR is a multi-billion dollar business - I think that's because customers need more than a tool to copy files. Third, MapR's no-NameNode architecture provides a huge advantage in terms of HA. Cloudera's NameNode HA has many issues and limitations - it doesn't fail back, it can only support up to 100M files, it only supports one failure, it has garbage collection issues, ...

WRT SQL-on-Hadoop - it's actually not confusing at all. The customers I have spoken to are not interested in being locked into a single SQL project. They want their Hadoop investments to be future proof, because all these technologies are still very early and immature and they all have their pros and cons (eg, Impala is very slow on HBase queries and sometimes returns incorrect results). For some use cases the right answer may be Shark, while for other use cases the answer may be Drill, or perhaps Impala. Companies want the flexibility to choose the right tool for the job, and that's what MapR is offering them.

-- Tomer Shiran, VP Product Management, MapR
User Rank: Apprentice
2/4/2014 | 12:54:44 PM
Re: Response by MapR
Hi Jack,

"None of the claims by Cloudera about MapR are accurate." Ok, let's take a look.

"We provide fully integrated search as part of our platform." Maybe I was mistaken. Which version of MapR do I download to get Search out of the box? M3? M5? M7? From your website[1] it really looks like you need a separate download of and license for LucidWorks Search to run Search in production.

"MapR wire-level security applies to ALL services in the cluster not simply HBase. Role-based access control is supported and fully integrated with all enterprise directory services." These are two different things. Never claimed MapR didn't have SSL (or any other wire-level security), or that M7 didn't provide RBAC. However, as far as I know, MapR doesn't have RBAC for data in Hive tables, i.e. what Apache Sentry provides.

If there's anything else, please let me know.

On the other hand, re: "The conversation about an Enterprise Data Hub should focus on the important requirements including full data protection, business continuity and high availability." Well, that's a pretty narrow view of requirements - which, of course, conveniently align to what your MapR-FS provides - but since those are important nonetheless, it's also fair to point out that Apache Hadoop and Cloudera provide all of that, too.

Finally, you end with mention Apache Drill and Apache Spark. There's an important distinction here that I think you've missed. Few want to run unsupported software in production, so it's important that they understand what you, as a vendor, are explicitly offering support for under contract. Honestly I can't tell what MapR actually supports, because you list just about every open source project and SQL-on-Hadoop tool on the market[2]. I even think I saw MapR talk about Facebook's Presto[3] the other day. This must be awfully confusing for customers. Where can we find the official list of *MapR supported* tools and projects, in the sense of "things a customer can file tickets about at 3am and you will debug under a given SLA and, hopefully, issue a fix"?

- Matt



[2] "There are a number of applications and projects that support SQL access against data contained in the MapR distribution for Hadoop including Apache Hive, Impala, Shark-on-Spark, Hadapt and others."

User Rank: Apprentice
2/4/2014 | 10:50:47 AM
Response by MapR
At MapR we encourage customer discussions about an Enterprise Data Hub. We have customers across industries that have selected MapR because of our unique capabilities – selections that resulted in returns of millions of dollars. None of the claims by Cloudera about MapR are accurate. We provide fully integrated search as part of our platform. MapR wire-level security applies to ALL services in the cluster not simply HBase. Role-based access control is supported and fully integrated with all enterprise directory services.  

The conversation about an Enterprise Data Hub should focus on the important requirements including full data protection, business continuity and high availability.

These are the key features that noted experts such as Mike Ferguson, a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, and a former Chief Architect at Teradata covers in his May 2013 paper - Offloading and Accelerating Data Warehouse ETL Processing Using Hadoop.

MapR invested several years of engineering effort to re-architect a data platform for Hadoop so it could support such enterprise-grade capabilities. Other distributions are merely claiming enterprise functionality and without the right platform to support it is setting users up for grand failure – at an enterprise level. 

Only MapR provides automated stateful failover, disaster recovery through snapshots and mirrors, and full data protection against user and application errors. Even with multiple hardware or software outages and errors, applications will continue running without any administrator actions required.

An Enterprise Data Hub needs to be easily integrated into an existing environment. Only MapR provides full NFS support. Existing applications and tools can easily access data in a MapR cluster and update data directly without custom connectors or the need for batch uploads.

An Enterprise Data Hub needs to perform file and database processing on a continuous basis. MapR eliminates downtime associated with HBase applications with instant recovery and provides consistent low latency support to support real-time applications.

There is a lot of exciting work going on in the community that adds additional value to an Enterprise Data Hub such as Apache Drill and Apache Spark. With the right foundation in place, or should I say hub, you can be even more successful.

 -- Jack Norris, CMO, MapR Technologies
User Rank: Apprentice
2/3/2014 | 4:30:07 PM
Re: Linux parallel
David, the core Hadoop community is strong and growing with contributors from many companies working together to better the platform. There's still plenty of room to improve. Recent advances in security (Sentry), workload management (YARN), and the core filesystem (HDFS), to name a few, are good examples.

Speaking from the Cloudera side (I work there), we're focused on actively building out CDH - the open source foundation of our enterprise data hub platform - along with the management tools, certifications, partner integrations, and support that our customers require to deploy Hadoop for real production use cases, and not just as "an experiment". 
David F. Carr
David F. Carr,
User Rank: Author
2/3/2014 | 4:06:17 PM
Re: Linux parallel
To what extent is the competition helping vs. hurting the advance of the core open source Hadoop technology?
D. Henschen
D. Henschen,
User Rank: Author
2/3/2014 | 3:47:23 PM
Re: Linux parallel
I do see most NoSQL, many NewSQL and some conventional relational database vendors following the open source model. In these cases it's one vendor trying to use open source to promote their own DBMS, so it doesn't really follow the Red Hat/SuSe model. In the Hadoop arena, Cloudera and Hortonworks have squared off as the primary promoters of the platform and they're both trying to lower the skills/barriers to entry with different flavors and approaches to the same platform. That's probably closer to what has happened with Red Hat and SuSe, though I'm no expert on Linux history and development.

Lorna Garey
Lorna Garey,
User Rank: Author
2/3/2014 | 3:24:12 PM
Linux parallel
Doug, Do you see enterprise adoption of open source databases following the same arc as Linux as a server OS? Essentially, early adopers were those with specialized expertise, then Red Hat and SuSe stepped in with services that lowered the skills barrier to entry?
User Rank: Apprentice
2/3/2014 | 2:35:14 PM
Re: Response by Hortonworks
Thanks for your comments, Shaun. As you know, Cloudera is just as deeply committed to the Apache open source community as Hortonworks. Apache Sentry (incubating) and Hue (recently adopted into HDP, if I'm not mistaken) are recent examples of contribution. Our engineers work daily side-by-side to improve the product for our customers. CDH - which includes Impala, Search, HBase, and the rest of the most popular and useful Hadoop ecosystem projects, which forms the core of our enterprise data hub, remains 100% Apache-licensed open source so that customers both benefit from community innovation and also avoid lock-in.

It's great to see the strides we are making together in the community. At the same time, our customers can appreciate the difference between roadmap and technology previews vs. delivered product. For example, with respect to security, you can see our comprehensive guide to Hadoop security here ( On the other hand, it's telling that Hortonworks lists security as a Labs feature.

In any case, we look forward to continuing to work with you and others in the community to make the enterprise data hub vision a reality for customers.

D. Henschen
D. Henschen,
User Rank: Author
2/3/2014 | 1:17:35 PM
Response by Hortonworks
Here are at-length responses to Cloudera's assertions supplied by Shaun Connolly, VP of Corporate Strategy at Hortonworks.

While I appreciate Cloudera's data hub vision, their desire to compete with the likes of IBM will only expose the fact that they mostly have a spoke versus their one-hub-to-rule-them-all marketing aspirations. At Hortonworks, we've been pretty consistent that Hadoop has a clear role in a modern data architecture ( where integrating with existing data center technologies and enabling customers to leverage existing skills is a key part of our focus.

Since our model is 100% open source, we describe our roadmap for enterprise Hadoop publicly in the Labs section of our website: If you look at the Security for Enterprise page (, you will see that there is no one single technology that magically adds security to Hadoop. This page describes the Authentication, Authorization, Accounting/Audit, and Data Protection capabilities that exist today as well as where the ongoing Security-related work that is happening in 2014.

As far as Cloudera's claim that Stinger has not shipped, I'd say that they are in denial on just how far Apache Hive has come. This is represented by the fact that CDH 4.5 continues to ship with a version of Hive (version 0.10) that is before all of the Stinger work (Hive is currently at version 0.12 and headed to version 0.13 shortly). Again, if you look at our Labs page for Stinger, you will see that Apache Hive has had multiple generally available releases in 2013 and is working its way towards yet another release here in Q1-2014.

Moreover, our partner Microsoft provided a guest blog post covering some innovative technical details of the Stinger Initiative:

Finally, as it relates to Dataset Management and Governance, again I recommend you take a look at out Labs page devoted to Data Management:

There is much progress that has been made in Apache Falcon and other areas of the enterprise Hadoop platform that address this area. Last December, we released a technology preview of these capabilities as they come in for a landing over the course of Q1-2014.

Bottom-line: while Cloudera tries to differentiate with a proprietary approach, the open source community momentum and delivery of enterprise-relevant capabilities only speeds up further.
Register for InformationWeek Newsletters
White Papers
Current Issue
Digital Transformation Myths & Truths
Transformation is on every IT organization's to-do list, but effectively transforming IT means a major shift in technology as well as business models and culture. In this IT Trend Report, we examine some of the misconceptions of digital transformation and look at steps you can take to succeed technically and culturally.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll