MapR: One Of These Things Is Not Like The Others
MapR's guiding principles are practicality and performance, so it didn't think twice about chucking the Hadoop Distributed File System out of its Hadoop software distribution. HDFS had (and still has, MapR argues) reliability and availability flaws, so MapR uses the proven Network File System (NFS) instead. In the bargain, MapR claims to get "twice the speed with half the required hardware." The NFS choice also enabled MapR to support near-real-time data streaming using messaging software from Informatica. MapR competitors Cloudera and Hortonworks can't stream data because HDFS is an append-only system.
MapR's latest quest for better performance (regardless of open source consequences) is the M7 software distribution, which the vendor says delivers high-performance Hadoop and HBase in one deployment. Many users have high hopes for HBase because it's the NoSQL database native to the Apache Hadoop platform (promising database access to all the data on Hadoop). But HBase is immature and still suffers from flaws, including instability and cumbersome administration.
M7 delivers two times faster performance than HBase running on standard Hadoop architectures, says MapR, because the distribution does away with region servers, table splits and merges and data compaction steps. MapR also uses its proprietary infrastructure to support snapshotting, high availability and system recovery for HBase.
If you're an open source purist swayed by arguments about system portability, MapR may not be the vendor for you. But we've talked to high-scale customers who have chosen MapR for better performance. Want to give it a try? MapR is available both on Amazon Web Services and the Google Compute Engine.