Comments
MongoDB, Cloudera Form Big Data Partnership
Newest First  |  Oldest First  |  Threaded View
bigdatarelated
50%
50%
bigdatarelated,
User Rank: Apprentice
5/3/2014 | 12:54:39 PM
Re: Snapshotting into Hadoop
I bet it will be impala. Anyway I've add a link to this article on BigdataRelated.com
Andrew Binstock
50%
50%
Andrew Binstock,
User Rank: Author
4/30/2014 | 7:56:21 PM
Re: Is Cloudera Selling HBase Short?
Tomer: Excellent points. The announcement is very short on specifics and doesn't answer how this can be done in real time, or even, whether it can be.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
4/30/2014 | 1:41:28 PM
Re: Snapshotting into Hadoop
Tomer, real-time movement of data from MongoDB into Hadoop is exactly what these partners were talking about with the new, deeper intergration described above in the article. They said it will take snapshots of the data in MongoDB and replicate in Hadoop using parallel processing. Execs didn't specify whether the access method would be HBase, but they did say the analysis could be done through a low-latency tool such as Spark or Impala. We'll learn more in June when this deeper MongoDB/Hadoop integration is set to be introduced in beta form.
TomerS943
100%
0%
TomerS943,
User Rank: Apprentice
4/30/2014 | 12:37:00 PM
Re: Is Cloudera Selling HBase Short?
HBase is an important component in the Hadoop stack. Many of our customers use both HBase and MongoDB in their organizations. In fact, HBase will serve a key role in providing real-time integration between MongoDB and Hadoop. There's a need to move beyond batch exports from MongoDB to Hadoop, and instead adopt a real-time, log-based replication approach (similar to Golden Gate or Informatica IDR in the relational world). There are two ways that can work:
  • Replicate from Mongo to static files in Hadoop. The data will be streamed from Mongo into Parquet files, with some 'schema discovery' that then populates Hive metastore with the columns discovered in the Mongo table. This will work for some use cases, but there are also some challenges. The main challenge is that a new form of compactions will need to be introduced, because updates and deletes in Mongo can't be performed on the Hadoop files directly. In addition, the data in Hadoop won't be available until the files are closed and the schema in Hadoop/Hive will need to remain in sync which could be a challenge over time.
  • Replicate from Mongo to HBase tables. The data will be streamed from Mongo into HBase tables, and the data can be queried directly from HBase. This approach is more real-time, and will be easier to manage. The HBase table will be a mirror of the Mongo table at all times, with no need to do extra 'compactions' on the Hadoop side.

Tomer Shiran

VP Product Management, MapR

 

 
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
4/30/2014 | 10:17:45 AM
Is Cloudera Selling HBase Short?
The other thought I had about this partnership is that Cloudera is NOT being very ambitious about the use of HBase -- perhaps in deference to MongoDB. Maybe MapR or another distributor might suggest that you can do more with data on a single cluster?
RSCHUMACHER400
50%
50%
RSCHUMACHER400,
User Rank: Apprentice
4/30/2014 | 9:59:08 AM
Re: Is the dividing line between NoSQL & Hadoop Use Cases That Clear?
I'd say it speaks more to the duplicate functional tech needs that exist in both the operational database and data warehouse worlds. For operational databases like Cassandra (and legacy RDBMS's like SQL Server, etc.), there will always be the need to analyze and search that data in the context of the online apps they serve, which is why we enable that in our platform. The same needs for analysis and search also exist in the data warehouse/lake worlds that Hadoop is now playing in. However, the use cases and apps that an operational DB and data warehouse/lake serve are still quite different, which is why the divide between the two still exists now in the NoSQL market just as it does in the traditional RDBMS world. In other words, in the same way an RDBMS guru doesn't use Teradata for online/transactional apps, none of our customers use our platform for a Hadoop data warehouse system. 
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
4/30/2014 | 9:05:23 AM
Re: Is the dividing line between NoSQL & Hadoop Use Cases That Clear?
Nobody ever thought of DataStax as a Hadoop vendor, but you do support Hadoop (and search) on the same cluster. Does this speak to the gray areas between NoSQL and Hadoop roles? 
RSCHUMACHER400
50%
50%
RSCHUMACHER400,
User Rank: Apprentice
4/30/2014 | 8:59:15 AM
Re: Is the dividing line between NoSQL & Hadoop Use Cases That Clear?
Just a quick clarifying note: DataStax is not a Hadoop vendor, but instead we focus on serving the database requirements of modern online applications - those that are always-on, distribute data around the globe, and need to scale without limits. These applications often have the need to run analytics and search operations on their online data, so we allow for that in our NoSQL platform by integrating analytics and search technologies that function across a distributed shared nothing architecture that can span multiple data centers and cloud availability zones. For more details on how this works, please see the following post: http://www.datastax.com/2013/06/why-hadoop-and-solr-in-datastax-enterprise
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
4/29/2014 | 10:50:38 AM
Is the dividing line between NoSQL & Hadoop Use Cases That Clear?
Is it quite as clear as NoSQL is for this and Hadoop is for that? MongoDB and Cloudera are thinking they can divide and conquer the market, but maybe practitioners and competitors have different ideas about overlaps between the two platform? During my call with MongoDB and Cloudera, for example, Bukhan said HBase is better suited to high-scale applications, but then Asay of MongoDB tried to do some back peddaling and point out that MongoDB can handle petabyte-scale applications. This one of the areas where gray areas might emerge. DataStax (of Cassandra fame), Couchbase, and (Basho supporting) Riak might have a different take on the best uses of NoSQL vs. Hadoop. By all means share your perspectives here.


Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 7, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program!
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.