MongoDB and Cloudera are the successful leading vendors in the NoSQL and Hadoop markets, respectively, but both firms figure they could be that much more successful if would-be customers weren't so confused about big data.
That's the gist of the reasoning behind a deeper alliance between the two companies announced Tuesday. As part of the deeper partnership, MongoDB and Cloudera say they will co-market and co-sell their software as complementary big-data technologies. In case you couldn't guess, MongoDB will be pitched as an operational database for high-scale applications while Cloudera's Hadoop-based Enterprise Data Hub will be described as an analytical platform.
[Want more on big data partnerships? Read Intel Invests In Cloudera, But What Changes?]
"I realized we needed to do something after I spoke at the Strata Conference last year on the topic of MongoDB and Hadoop working together," said Matt Asay, MongoDB's VP of marketing, business development, and corporate strategy, in a phone interview with InformationWeek. "Afterward I was blistered by people who said, 'I thought MongoDB and Hadoop were competitors.' "
You might think that anybody confused about the appropriate use of NoSQL and Hadoop might need to do more research, but there are gray areas between the two platforms such as HBase, the NoSQL database that's part of Hadoop. But HBase is suited to super-high-scale but rather simplistic use cases, while MongoDB supports much more complex data modeling, according to Yuri Bukhan, director of the ISV Alliances Program at Cloudera.
Bukhan cites online behavior analysis as common ground where HBase and MongoDB serve in distinct roles. "If you're looking at simple user clicks or sessions, HBase offers very fast random reads and random writes if you want to look up users on a particular key, but MongoDB provides a much richer model though which you could track user behavior all the way through an online application."
MongoDB and Cloudera already have bi-directional data connectors, but Asay and Bukhan said the two firms are preparing a deeper integration whereby the live, operational data with MongoDB can be snapshotted into Cloudera's data hub in parallel for analysis. This analysis can happen in near-real-time through the Shark framework or Impala and then be passed back to MongoDB to trigger the display of personalized content or a most-appropriate offer based on the analysis within Hadoop.
This integration, which is expected to be demoed at MongoDB World in New York in June, will run on YARN, the new resource management layer introduced with Hadoop 2.0. But there was no talk of running MongoDB and Cloudera on the same cluster of servers -- a leap ahead that would just confuse matters.
For now the MongoDB-Cloudera partnership is one of convenience, allowing two successful companies to paint a simple NoSQL-for-operational-database, Hadoop-for-analytics picture of the big-data market. Why Cloudera and not the entire Hadoop community?
"This is one of the fantastic things about open-source," says Asay. "This development is being done out in the open, and much of what we do will be available to all, so the other Hadoop vendors will be able to use it."
Another NoSQL vendor, like DataStax, might not paint quite as clean a demarcation between NoSQL and Hadoop roles. DataStax's software distribution, for example, includes both the Cassandra NoSQL database and Hadoop, and they both run on the same cluster. What's more, DataStax and other high-scale database vendors have been busy adding to and touting the analytic query capabilities of their databases.
MongoDB and Cloudera are already jointly selling their software in the field, according to Asay, and they've "aligned" their salesforces to offer consistent messaging on the best uses of their respective products. Things might get messier down the road, but having recently landed massive venture capital infusions, MongoDB and Cloudera apparently feel confident (and flush) enough to divide and conquer the big-data market.
Private clouds are moving rapidly from concept to production. But some fears about expertise and integration still linger. Also in the Private Clouds Step Up issue of InformationWeek: The public cloud and the steam engine have more in common than you might think (free registration required).Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio