Comments
Cloudera Trash Talks With Enterprise Data Hub Release
Newest First  |  Oldest First  |  Threaded View
Michaelrj
50%
50%
Michaelrj,
User Rank: Apprentice
2/10/2014 | 2:36:38 AM
Re: Response by Hortonworks
Mr Connolly doesn't address search on Hortonworks. I thought they had Elasticsearch as a partner for search?
anon4464812233
100%
0%
anon4464812233,
User Rank: Apprentice
2/4/2014 | 6:38:59 PM
Re: Response by MapR
Matt,

MapR Search can be used with M3, M5 or M7. There is an additional subscription cost for customers who want support. I may be wrong, but I thought that Cloudera Search also involved an additional subscription cost for customers who want support. I do agree with your comment that few customers want to run unsupported software in production.

With regard to security:
  • Authentication - MapR is the only distribution that supports Kerberos and also enables customers that don't have a Kerberos infrastructure to enjoy strong wire-level authentication and encryption.
  • Authorization - MapR provides fine-grained ACLs on files, directories, tables, column families, columns, queues, jobs, volumes and management capabilities. You mentioned Apache Sentry, which will soon be part of the MapR distribution, and customers can actually use it today (last I checked, it was an Apache project, not a Cloudera differentiator). But you should keep in mind that Sentry is only relevant if the only thing that the users are doing is Hive/Impala queries. In other words, if the users are doing MapReduce, Pig, Cascading, Spark, direct file access (eg, hadoop fs -cat), etc., all the permissions in Sentry are irrelevant! Our customers view the enterprise data hub as more than just SQL queries - they want to run a variety of workloads on the cluster, not just SQL queries. To address this gap (which I'm sure you'll agree is huge), MapR has built RBAC into the underlying data so that it applies across all processing frameworks and methods of access. That's a huge advantage - and an absolute requirement for customers that really care about security.

Bottom line - MapR has significant advantages over Cloudera in both authentication and authroization, no matter how you look at it.

With regard to data protection and disaster recovery, I think it's disingenuous to claim that Cloudera has that. I understand that you want to gloss over any details here, but let's put a few of them on the table. First, Cloudera introduced the world's first inconsistent snapshots (HDFS and HBase snapshots). Believe it or not, data can actually change inside the HDFS snapshot! Basically, this renders the feature useless for most use cases other than as a marketing checkbox. Second, Cloudera's DR is a GUI on top of distcp, a MapReduce job that copies files - each file is copied at a different time (no consistency), it's not done at the block level, etc. In the storage and database worlds DR is a multi-billion dollar business - I think that's because customers need more than a tool to copy files. Third, MapR's no-NameNode architecture provides a huge advantage in terms of HA. Cloudera's NameNode HA has many issues and limitations - it doesn't fail back, it can only support up to 100M files, it only supports one failure, it has garbage collection issues, ...

WRT SQL-on-Hadoop - it's actually not confusing at all. The customers I have spoken to are not interested in being locked into a single SQL project. They want their Hadoop investments to be future proof, because all these technologies are still very early and immature and they all have their pros and cons (eg, Impala is very slow on HBase queries and sometimes returns incorrect results). For some use cases the right answer may be Shark, while for other use cases the answer may be Drill, or perhaps Impala. Companies want the flexibility to choose the right tool for the job, and that's what MapR is offering them.

-- Tomer Shiran, VP Product Management, MapR
mattbrandwein
50%
50%
mattbrandwein,
User Rank: Apprentice
2/4/2014 | 12:54:44 PM
Re: Response by MapR
Hi Jack,

"None of the claims by Cloudera about MapR are accurate." Ok, let's take a look.

"We provide fully integrated search as part of our platform." Maybe I was mistaken. Which version of MapR do I download to get Search out of the box? M3? M5? M7? From your website[1] it really looks like you need a separate download of and license for LucidWorks Search to run Search in production.

"MapR wire-level security applies to ALL services in the cluster not simply HBase. Role-based access control is supported and fully integrated with all enterprise directory services." These are two different things. Never claimed MapR didn't have SSL (or any other wire-level security), or that M7 didn't provide RBAC. However, as far as I know, MapR doesn't have RBAC for data in Hive tables, i.e. what Apache Sentry provides.

If there's anything else, please let me know.

On the other hand, re: "The conversation about an Enterprise Data Hub should focus on the important requirements including full data protection, business continuity and high availability." Well, that's a pretty narrow view of requirements - which, of course, conveniently align to what your MapR-FS provides - but since those are important nonetheless, it's also fair to point out that Apache Hadoop and Cloudera provide all of that, too.

Finally, you end with mention Apache Drill and Apache Spark. There's an important distinction here that I think you've missed. Few want to run unsupported software in production, so it's important that they understand what you, as a vendor, are explicitly offering support for under contract. Honestly I can't tell what MapR actually supports, because you list just about every open source project and SQL-on-Hadoop tool on the market[2]. I even think I saw MapR talk about Facebook's Presto[3] the other day. This must be awfully confusing for customers. Where can we find the official list of *MapR supported* tools and projects, in the sense of "things a customer can file tickets about at 3am and you will debug under a given SLA and, hopefully, issue a fix"?

- Matt

 

[1] http://www.mapr.com/products/mapr-search-a-single-easy-dependable-and-fast-platform-for-search-nosql-and-apache-hadoop

[2] http://www.mapr.com/products#sql "There are a number of applications and projects that support SQL access against data contained in the MapR distribution for Hadoop including Apache Hive, Impala, Shark-on-Spark, Hadapt and others."

[3] http://www.mapr.com/blog/sql-in-hadoop-with-mapr-you-can-have-your-cake-and-eat-it-too
JackN637
100%
0%
JackN637,
User Rank: Apprentice
2/4/2014 | 10:50:47 AM
Response by MapR
At MapR we encourage customer discussions about an Enterprise Data Hub. We have customers across industries that have selected MapR because of our unique capabilities – selections that resulted in returns of millions of dollars. None of the claims by Cloudera about MapR are accurate. We provide fully integrated search as part of our platform. MapR wire-level security applies to ALL services in the cluster not simply HBase. Role-based access control is supported and fully integrated with all enterprise directory services.  

The conversation about an Enterprise Data Hub should focus on the important requirements including full data protection, business continuity and high availability.

These are the key features that noted experts such as Mike Ferguson, a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, and a former Chief Architect at Teradata covers in his May 2013 paper - Offloading and Accelerating Data Warehouse ETL Processing Using Hadoop.

MapR invested several years of engineering effort to re-architect a data platform for Hadoop so it could support such enterprise-grade capabilities. Other distributions are merely claiming enterprise functionality and without the right platform to support it is setting users up for grand failure – at an enterprise level. 

Only MapR provides automated stateful failover, disaster recovery through snapshots and mirrors, and full data protection against user and application errors. Even with multiple hardware or software outages and errors, applications will continue running without any administrator actions required.

An Enterprise Data Hub needs to be easily integrated into an existing environment. Only MapR provides full NFS support. Existing applications and tools can easily access data in a MapR cluster and update data directly without custom connectors or the need for batch uploads.

An Enterprise Data Hub needs to perform file and database processing on a continuous basis. MapR eliminates downtime associated with HBase applications with instant recovery and provides consistent low latency support to support real-time applications.

There is a lot of exciting work going on in the community that adds additional value to an Enterprise Data Hub such as Apache Drill and Apache Spark. With the right foundation in place, or should I say hub, you can be even more successful.

 -- Jack Norris, CMO, MapR Technologies
mattbrandwein
100%
0%
mattbrandwein,
User Rank: Apprentice
2/3/2014 | 4:30:07 PM
Re: Linux parallel
David, the core Hadoop community is strong and growing with contributors from many companies working together to better the platform. There's still plenty of room to improve. Recent advances in security (Sentry), workload management (YARN), and the core filesystem (HDFS), to name a few, are good examples.

Speaking from the Cloudera side (I work there), we're focused on actively building out CDH - the open source foundation of our enterprise data hub platform - along with the management tools, certifications, partner integrations, and support that our customers require to deploy Hadoop for real production use cases, and not just as "an experiment". 
David F. Carr
50%
50%
David F. Carr,
User Rank: Author
2/3/2014 | 4:06:17 PM
Re: Linux parallel
To what extent is the competition helping vs. hurting the advance of the core open source Hadoop technology?
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
2/3/2014 | 3:47:23 PM
Re: Linux parallel
I do see most NoSQL, many NewSQL and some conventional relational database vendors following the open source model. In these cases it's one vendor trying to use open source to promote their own DBMS, so it doesn't really follow the Red Hat/SuSe model. In the Hadoop arena, Cloudera and Hortonworks have squared off as the primary promoters of the platform and they're both trying to lower the skills/barriers to entry with different flavors and approaches to the same platform. That's probably closer to what has happened with Red Hat and SuSe, though I'm no expert on Linux history and development.

 
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
2/3/2014 | 3:24:12 PM
Linux parallel
Doug, Do you see enterprise adoption of open source databases following the same arc as Linux as a server OS? Essentially, early adopers were those with specialized expertise, then Red Hat and SuSe stepped in with services that lowered the skills barrier to entry?
mattbrandwein
100%
0%
mattbrandwein,
User Rank: Apprentice
2/3/2014 | 2:35:14 PM
Re: Response by Hortonworks
Thanks for your comments, Shaun. As you know, Cloudera is just as deeply committed to the Apache open source community as Hortonworks. Apache Sentry (incubating) and Hue (recently adopted into HDP, if I'm not mistaken) are recent examples of contribution. Our engineers work daily side-by-side to improve the product for our customers. CDH - which includes Impala, Search, HBase, and the rest of the most popular and useful Hadoop ecosystem projects, which forms the core of our enterprise data hub, remains 100% Apache-licensed open source so that customers both benefit from community innovation and also avoid lock-in.

It's great to see the strides we are making together in the community. At the same time, our customers can appreciate the difference between roadmap and technology previews vs. delivered product. For example, with respect to security, you can see our comprehensive guide to Hadoop security here (http://vision.cloudera.com/securing-the-enterprise-data-hub/). On the other hand, it's telling that Hortonworks lists security as a Labs feature.

In any case, we look forward to continuing to work with you and others in the community to make the enterprise data hub vision a reality for customers.

 
D. Henschen
0%
100%
D. Henschen,
User Rank: Author
2/3/2014 | 1:17:35 PM
Response by Hortonworks
Here are at-length responses to Cloudera's assertions supplied by Shaun Connolly, VP of Corporate Strategy at Hortonworks.

While I appreciate Cloudera's data hub vision, their desire to compete with the likes of IBM will only expose the fact that they mostly have a spoke versus their one-hub-to-rule-them-all marketing aspirations. At Hortonworks, we've been pretty consistent that Hadoop has a clear role in a modern data architecture (http://hortonworks.com/hadoop-modern-data-architecture/) where integrating with existing data center technologies and enabling customers to leverage existing skills is a key part of our focus.

Since our model is 100% open source, we describe our roadmap for enterprise Hadoop publicly in the Labs section of our website: http://hortonworks.com/labs/. If you look at the Security for Enterprise page (http://hortonworks.com/labs/security/), you will see that there is no one single technology that magically adds security to Hadoop. This page describes the Authentication, Authorization, Accounting/Audit, and Data Protection capabilities that exist today as well as where the ongoing Security-related work that is happening in 2014.

As far as Cloudera's claim that Stinger has not shipped, I'd say that they are in denial on just how far Apache Hive has come. This is represented by the fact that CDH 4.5 continues to ship with a version of Hive (version 0.10) that is before all of the Stinger work (Hive is currently at version 0.12 and headed to version 0.13 shortly). Again, if you look at our Labs page for Stinger, you will see that Apache Hive has had multiple generally available releases in 2013 and is working its way towards yet another release here in Q1-2014.
http://hortonworks.com/labs/stinger/

Moreover, our partner Microsoft provided a guest blog post covering some innovative technical details of the Stinger Initiative: http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/

Finally, as it relates to Dataset Management and Governance, again I recommend you take a look at out Labs page devoted to Data Management: http://hortonworks.com/labs/data-management/

There is much progress that has been made in Apache Falcon and other areas of the enterprise Hadoop platform that address this area. Last December, we released a technology preview of these capabilities as they come in for a landing over the course of Q1-2014.

Bottom-line: while Cloudera tries to differentiate with a proprietary approach, the open source community momentum and delivery of enterprise-relevant capabilities only speeds up further.


Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Nov. 10, 2014
Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 9, 2014.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.