Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.
MapR Hadoop Upgrade Runs HP Vertica
MapR's release of Hadoop 2.2 with YARN supports HP's popular, high-scale analytical database as a SQL-On-Hadoop option.
February 11, 2014
4 Min Read
MapR says its NFS-based storage platform can run data-intensive applications and database management systems without significant rewrites.
16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms (Click image for larger view and slideshow.)
MapR Technologies on Tuesday became the last of the big-three Hadoop players to release a software distribution incorporating YARN (following Cloudera and Hortonworks), but it's touting two important advantages in version 2.2 that it says were worth waiting for: broad compatibility with a wide array of data-analysis applications and backward compatibility with MapReduce 1.0.
As a proof point on compatibility with data-processing and data-analysis applications, the company announced a preview release of the HP Vertica Analytics platform on MapR, a new SQL-on-Hadoop option that offers the speed and SQL-compliance of an industry-leading database management system (DBMS).
"Organizations embracing Hadoop have been struggling to empower large groups of business analysts who require sophisticated SQL and BI tools to do their jobs, but feel hand-cuffed when using incomplete, SQL-like approaches," said John Schroeder, CEO and cofounder of MapR Technologies, in a statement. "Providing HP Vertica's high-performance, rich SQL, and built-in analytic functions on... Hadoop sets business analysts free to do faster, interactive analytics."
[Want more on the three-horse Hadoop race? Read Cloudera Trash Talks With Enterprise Data Hub Release.]
"Incomplete" and "SQL-like" are not-so-veiled references to Impala and Hive, the Apache projects championed by MapR rivals Cloudera and Hortonworks, respectively. This isn't the first time a DBMS has run on Hadoop. Pivotal put its Greenplum database engine on top of the Pivotal HD distribution with HAWQ, and Calpont has done much the same with its InfiniDB DBMS and the Hadoop Distributed File System (HDFS). But those efforts took reengineering of the DBMS, whereas MapR says it can work with unaltered Vertica and many other data platforms.
"YARN is great and it lets you do more than MapReduce with Hadoop, but when you combine that with our general-purpose storage platform, it greatly expands what's possible," Jack Norris, MapR's chief marketing officer, said in an interview with InformationWeek.
The "general-purpose" line is a reference to MapR's concurrent-read-write-capable data platform. MapR parted ways with the rest of the Hadoop community long ago by replacing HDFS with this Network File System (NFS) protocol-based system that can handle random, concurrent reads and writes. By comparison, HDFS operates "like a CD-ROM, not like enterprise storage," Norris asserted. And because many applications assume the use of such storage systems, MapR's latest release can support many applications without alteration, he said.
"Whether it's a high-performance computing environment, pricing-optimization applications, operations research apps, or a widely used DBMS like HP Vertica, you can run it on our platform and take advantage of the distributed data environment," Norris said. The implementation for HP Vertica to run on MapR is expected to be generally available in March.
The second advantage MapR is touting with its latest release is backwards compatibility with MapReduce 1.0 jobs. Companies with lots of legacy MapReduce processes running on other distributions face a tough choice on when and how to migrate to new Hadoop 2.0 clusters because jobs will have to be rewritten to work with YARN. MapR says its latest release is unique in running MapReduce 1.x and YARN on the same nodes simultaneously, providing an "easy and risk-free" to new Hadoop clusters.
Long-time MapR customer comScore is "excited" about MapR's "seamless upgrade path," said CTO Michael Brown in a statement, because "we run more than 20,000 jobs each day on our production MapR cluster." Even if its only hundreds or thousands of job types run repeatedly throughout the day by comScore, that scale of rewriting would obvoiusly be daunting.
In a third announcement on Tuesday, the company introduced a free, laptop-deployable MapR Sandbox for Hadoop. The download lets users deploy a "complete and fully-configured virtual machine" of the MapR distribution for Hadoop within five minutes, according to the vendor. The Sandbox includes point-and-click tutorials for developers, analysts, and administrators, plus the ability to drag and drop files from any source into the system and start developing and testing applications.
MapR departures from the standard Hadoop stack have led competitors to dismiss it as proprietary and prone to vendor lock-in (a criticism that Hortonworks also levels against Cloudera for its management software and other components). But MapR tunes them out and promises higher-performance, mission-critical snapshotting, and high-availability features that it says HDFS cannot support.
MapR is third in a three-horse race among independent Hadoop distributors. What has yet to be seen is whether the gap behind leader Cloudera and fast-growing rival Hortonworks will open or close in the Hadoop 2.0 era.
Too many companies treat digital and mobile strategies as pet projects. Here are four ideas to shake up your company. Also in the Digital Disruption issue of InformationWeek: Six enduring truths about selecting enterprise software. (Free registration required.)
About the Author(s)
Executive Editor, Enterprise Apps
Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of Transform Magazine, and Executive Editor at DM News. He has covered IT and data-driven marketing for more than 15 years.
You May Also Like