Comments
Big Data Debate: Will HBase Dominate NoSQL?
Newest First  |  Oldest First  |  Threaded View
vrodionov
50%
50%
vrodionov,
User Rank: Apprentice
8/21/2013 | 6:48:49 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
Cassandra "flexible data placement (a.k.a SSD support)" is not that good. You put the whole Column Family into SSD , eventually CF will exceed the SSD size and than what? It is not the hot data set caching per se.
mhausenblas
50%
50%
mhausenblas,
User Rank: Apprentice
8/13/2013 | 6:03:47 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
Valid point, yes. The argument was along the line: FB created Cassandra in the first place, then replaced it with something else (which happened to be HBase). Not the strongest argument, I admit, more an indicator.

However, as I said in the first paragraph: it's all relative, really. One size doesn't fit it all in the data storage and processing world (aka polyglot persistence). In this context I like to encourage everyone who hasn't done already to read Stonebraker's excellent piece (from 2005!): http://citeseerx.ist.psu.edu/v...
EricL755
50%
50%
EricL755,
User Rank: Apprentice
8/12/2013 | 10:48:03 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
I am not sure how this holds to a proof point, "An interesting proof point for the superiority of HBase is the fact that Facebook, the creator of Cassandra, replaced Cassandra with HBase for their internal use." Why does Facebook choosing it mean that it's superior?

This is using the argument from authority logic. In other words, if most of what Facebook engineering does is right and they choose HBase, then it must be right. There is certainly no question as to whether or not Facebook is full of brilliant engineers. But there are plenty of other companies that do amazing things with technology who have made the decision to go with Cassandra. You can't say that HBase is a good choice simply because Facebook uses it.
vrodionov
50%
50%
vrodionov,
User Rank: Apprentice
8/12/2013 | 6:39:06 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
"Compaction throttling to avoid spikes in application response time.- M7 does not have any Compactions - Done"

No compactions? Does M7 overwrite data in place?

The major issue with M3/5/7 is that it does not provide easy migration/upgrade from existing Hadoop/HBase to MapR's distribution. At least, this was the case in the late 2011. Besides this, its proprietary technology.
nbandugula
50%
50%
nbandugula,
User Rank: Apprentice
8/9/2013 | 9:04:54 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
To complete the story on MapR's innovation that Michael referred to, here are some things we have done with MapR M7 to make HBase applications enterprise-grade:

Following Ellis' lead:

a. Master-oriented design makes HBase operationally inflexible. - M7 does not have a region server architecture - Done

b. Failure means Downtime - M7 does not have single points of failure and recovers in less than a minute - Done

c. HDFS is designed for streaming access to large files - M7 does not rely on HDFS - Done

- Mixing solid state and hard disks in a single cluster and pinning tables to workload-appropriate media. - M7 works with disparate hardware including SSDs - Done

-- Snapshots, incremental backups, and point-in-time recovery.- M7 provides all of these features - Done

-- Compaction throttling to avoid spikes in application response time.- M7 does not have any Compactions - Done

-- Dynamically routing requests to the best-performing replicas.- M7 delivers this functionality as well - Done

Plus M7 is a complete distribution for Apache Hadoop that supports more than a dozen Apache projects and a wide variety of 3rd party tools including for SQL query.
RSCHUMACHER400
50%
50%
RSCHUMACHER400,
User Rank: Apprentice
8/7/2013 | 7:41:55 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
Hi Doug - please see our customers page for details (http://www.datastax.com/custom..., but in brief, we do have customers that use more than just Cassandra (C*). On our customers page you'll find examples like MarkedUp (all 3), eBay (C* and Hadoop), Datafiniti (C* and Solr), HealthCare Anytime (all 3), Constant Contact (C* and Hadoop), SimpleReach (C* and Hadoop), Boxever (C* and Hadoop), and Skillpages (all 3).
vrodionov
50%
50%
vrodionov,
User Rank: Apprentice
8/7/2013 | 12:11:13 AM
re: Big Data Debate: Will HBase Dominate NoSQL?
Mr. Ellis, everyone here understands that your analyses and opinion as well as all tests results you are referring to are highly biased in favor of Cassandra. I lmao (ye-h, I know some basic slang) when I read PDF you have posted link here to. 90msec read latency? Have the authors read data from other data center? In case of HBase? When all data fits block cache or OS page cache - the read latency is less than 1ms (actually - its 0.4-0.5ms in average). We (the company I am working on) have being routinely running different workloads on HBase in dev, staging and production for more than tree years already and stability, performance and feature set of HBase are getting better with every new version. For me (and for many others) , major advantages of HBase are:

1. Tight integration into Hadoop/HDFS stack. I think its the major one and this eventually will bring HBase on top of NoSQl crowd.

2. Extensibility. Coprocessors are very good feature for any one trying to implement something more complex than simple K-V look up.

3. Can I say that HBase is more SQL - friendly? Phoenix, Hive?

HBase (properly tuned and configured) is not beatable in write heavy workloads. We can get far more than 1M writes per sec from 20 node cluster (not from 200 as Mr. Netflix guy). Yes , the cluster and clients are tuned and use all recommended performance tips. Complex? May be. but eventually, everything will become available from out of box, w/o any additional tuning.

You are so proud of Cassandra random reads "domination" (due to row cache mostly in Cassandra and the lack of thereof in HBase ), but I would like to point out that Cassandra cache (both key and row) are half-baked and the implementation is far from optimal (you still keep keys in Java heap?). Sorry, I am not following the latest advancements in Cassandra development now. Moreover, the lack of good block cache in Cassandra makes Cassandra less suitable for short scan operations (one of the reasons, Facebook has decided in favor of HBase). For me, personally, its a deal breaker, because so many real customer workloads fall into "short scan operation" category. Another deal breaker is the lack of real Hadoop integration.

Random read performance in HBase (I do not think its really worse than Cassandra's) can be increased by introducing RowCache into HBase and when it will happen, I think, we will get indisputable winner, Mr. Ellis. Its doable and it is going to happen pretty soon.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
8/6/2013 | 8:22:42 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
"Dominant" doesn't discount the opportunity for diversity, though, I'll admit, it's a somewhat simplistic construction meant to spark debate. The question was NOT posed as an either/or. DataStax chose (for obvious reasons) to focus on HBase vs. Cassandra. I do think many people do have big expectations for HBase because of its tie with Hadoop. Perhaps a bigger role will emerge if some of the flaws DataStax points to can be addressed.
EB Quinn
100%
0%
EB Quinn,
User Rank: Apprentice
8/6/2013 | 6:28:01 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
A bit of a silly premise, and definitely not an either/or scenario: HBase will clearly be used when Hadoop is used - end of story. Cassandra isn't going to displace HBase, but will co-exist to handle other, related use cases, more elegantly. Plus, MongoDB will be used as a more modern era alternative to mySQL, HANA will be used to fly through SAP analytics, MarkLogic excels at content-oriented apps. And there are several dedicated cloud databases too. The NoSQL (Not Only SQL) movement gains strength from diversity, and has pushed Oracle, IBM and Microsoft to offer up columnar, for example, options. But at this point NONE of the NoSQL databases could be considered dominant, and despite the growing popularity of Hadoop, no way HBase is going to extend into a more general purpose DB, it lacks the architectural chops (pointed out nicely by Mr. Ellis), and it lacks the expertise base with the chops.

When the day comes where there are more production Hadoop implementations than the combination of SAS, ODW, Teradata, IBM's many options, SAP BW and HANA, Microstrategy, Tableau, etc., etc., etc., well, maybe we can talk dominant down one DNA strain of the industry. That will take quite awhile.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
8/6/2013 | 6:22:59 PM
re: Big Data Debate: Will HBase Dominate NoSQL?
This debate sparks many questions for me. Can open source purists, for example, detail how HBase will overcome the flaws that DataStax cites -- some of which MapR has addressed, albeit with a proprietary, commercial approach? DataStax can certainly say it integrates Cassandra with Hadoop (providing the same shared infrastructure advantages of the combo of HBase and Hadoop), but why do I hear little to nothing about customers relying on DataStax for their Hadoop deployments? Can you name names of customers that actually do it all (Cassandra, Hadoop and Solr) with DataStax' software? The focus is clearly on Cassandra.

Hortonworks and Cloudera, what's your take, as you clearly have a big stake in HBase success?


IT's Reputation: What the Data Says
IT's Reputation: What the Data Says
InformationWeek's IT Perception Survey seeks to quantify how IT thinks it's doing versus how the business really views IT's performance in delivering services - and, more important, powering innovation. Our results suggest IT leaders should worry less about whether they're getting enough resources and more about the relationships they have with business unit peers.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Must Reads Oct. 21, 2014
InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A roundup of the top stories and trends on InformationWeek.com
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.