Big Data // Big Data Analytics
News
2/11/2014
01:22 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

MapR Hadoop Upgrade Runs HP Vertica

MapR's release of Hadoop 2.2 with YARN supports HP's popular, high-scale analytical database as a SQL-On-Hadoop option.

16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)

MapR Technologies on Tuesday became the last of the big-three Hadoop players to release a software distribution incorporating YARN (following Cloudera and Hortonworks), but it's touting two important advantages in version 2.2 that it says were worth waiting for: broad compatibility with a wide array of data-analysis applications and backward compatibility with MapReduce 1.0.

As a proof point on compatibility with data-processing and data-analysis applications, the company announced a preview release of the HP Vertica Analytics platform on MapR, a new SQL-on-Hadoop option that offers the speed and SQL-compliance of an industry-leading database management system (DBMS).

"Organizations embracing Hadoop have been struggling to empower large groups of business analysts who require sophisticated SQL and BI tools to do their jobs, but feel hand-cuffed when using incomplete, SQL-like approaches," said John Schroeder, CEO and cofounder of MapR Technologies, in a statement. "Providing HP Vertica's high-performance, rich SQL, and built-in analytic functions on... Hadoop sets business analysts free to do faster, interactive analytics." 

[Want more on the three-horse Hadoop race? Read Cloudera Trash Talks With Enterprise Data Hub Release.]

"Incomplete" and "SQL-like" are not-so-veiled references to Impala and Hive, the Apache projects championed by MapR rivals Cloudera and Hortonworks, respectively. This isn't the first time a DBMS has run on Hadoop. Pivotal put its Greenplum database engine on top of the Pivotal HD distribution with HAWQ, and Calpont has done much the same with its InfiniDB DBMS and the Hadoop Distributed File System (HDFS). But those efforts took reengineering of the DBMS, whereas MapR says it can work with unaltered Vertica and many other data platforms.

"YARN is great and it lets you do more than MapReduce with Hadoop, but when you combine that with our general-purpose storage platform, it greatly expands what's possible," Jack Norris, MapR's chief marketing officer, said in an interview with InformationWeek.  

The "general-purpose" line is a reference to MapR's concurrent-read-write-capable data platform. MapR parted ways with the rest of the Hadoop community long ago by replacing HDFS with this Network File System (NFS) protocol-based system that can handle random, concurrent reads and writes. By comparison, HDFS operates "like a CD-ROM, not like enterprise storage," Norris asserted. And because many applications assume the use of such storage systems, MapR's latest release can support many applications without alteration, he said.

"Whether it's a high-performance computing environment, pricing-optimization applications, operations research apps, or a widely used DBMS like HP Vertica, you can run it on our platform and take advantage of the distributed data environment," Norris said. The implementation for HP Vertica to run on MapR is expected to be generally available in March.

MapR says its NFS-based storage platform can run data-intensive applications and database management systems without significant rewrites.
MapR says its NFS-based storage platform can run data-intensive applications and database management systems without significant rewrites.

The second advantage MapR is touting with its latest release is backwards compatibility with MapReduce 1.0 jobs. Companies with lots of legacy MapReduce processes running on other distributions face a tough choice on when and how to migrate to new Hadoop 2.0 clusters because jobs will have to be rewritten to work with YARN. MapR says its latest release is unique in running MapReduce 1.x and YARN on the same nodes simultaneously, providing an "easy and risk-free" to new Hadoop clusters.

Long-time MapR customer comScore is "excited" about MapR's "seamless upgrade path," said CTO Michael Brown in a statement, because "we run more than 20,000 jobs each day on our production MapR cluster." Even if its only hundreds or thousands of job types run repeatedly throughout the day by comScore, that scale of rewriting would obvoiusly be daunting.

In a third announcement on Tuesday, the company introduced a free, laptop-deployable MapR Sandbox for Hadoop. The download lets users deploy a "complete and fully-configured virtual machine" of the MapR distribution for Hadoop within five minutes, according to the vendor. The Sandbox includes point-and-click tutorials for developers, analysts, and administrators, plus the ability to drag and drop files from any source into the system and start developing and testing applications.

MapR departures from the standard Hadoop stack have led competitors to dismiss it as proprietary and prone to vendor lock-in (a criticism that Hortonworks also levels against Cloudera for its management software and other components). But MapR tunes them out and promises higher-performance, mission-critical snapshotting, and high-availability features that it says HDFS cannot support.

MapR is third in a three-horse race among independent Hadoop distributors. What has yet to be seen is whether the gap behind leader Cloudera and fast-growing rival Hortonworks will open or close in the Hadoop 2.0 era.

Too many companies treat digital and mobile strategies as pet projects. Here are four ideas to shake up your company. Also in the Digital Disruption issue of InformationWeek: Six enduring truths about selecting enterprise software. (Free registration required.)

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Threaded  |  Newest First  |  Oldest First
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
2/11/2014 | 1:54:59 PM
The return of the NameNode controversy?
I thought that old criticism about Hadoop NameNode reliability, etc., had died down with changes made to core Apache Hadoop software, but MapR is still insisting that Hadoop can't be a "mission critical" with HDFS as-is. "People assume that 'snapshots' are point-in-time consistent snapshots across volumes and clusters, but with open source HDFS, whenever the file is closed, that's the data that's contained in that snapshot," MapR's Jack Norris told me last week. "So that means there are all storts of time stamps associated with a snapshot [if you use HDFS], and you can't recover an application that way."

With these vulnerabilities, talk of Hadoop as an "enterprise data hub," as espoused by Cloudera, is premature, says Norris, because you can't depend on the data that gives you higher-level analytical capabilities at the top of the stack. Cloudera and Hortonworks would obviously disagree with these assertions, but they're too busy throwing cold water on each other's strategies. Frankly, the more these three companies rail at each other, the less faith the whole world has in Hadoop -- and they're not distinquishing among anybody's distribution.

 
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
2/11/2014 | 5:30:39 PM
HDFS 'like a CD-ROM'
HDFS operates "like a CD-ROM, not like enterprise storage," said Jack Norris, MapR CMO. That's sacrilidge in the Hadoop community, and I wish him luck with it. Replacing HDFS with NFS, concocted by Sun Microsystems for its workstations in the mid-1980s, just might be a bit of genius. Then again, geniuses are usually found in the select minority.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - September 10, 2014
A high-scale relational database? NoSQL database? Hadoop? Event-processing technology? When it comes to big data, one size doesn't fit all. Here's how to decide.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.