Pivotal Brings In-Memory Analysis To Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
12:06 PM
Connect Directly

Pivotal Brings In-Memory Analysis To Hadoop

Pivotal takes on Cloudera and Hortonworks with GemFire XD, enhanced SQL querying, and new machine-learning options in Pivotal HD 2.0.

8 Data Centers For Cloud's Toughest Jobs
8 Datacenters For Cloud's Toughest Jobs
(Click image for a larger view and slideshow.)

Pivotal, the EMC spin-off company pursuing modern application development in the context of cloud computing and big-data analysis, on Monday released Pivotal HD 2.0, an update of its Hadoop distribution incorporating an in-memory database and a battery of new analysis capabilities.

Pivotal HD 2.0 is the vendor's first distribution based on Apache Hadoop 2.2, the latest release of the open source platform incorporating YARN system resource management controls. The release also integrates and supports Apache GraphLab, an open source framework for derivatives monitoring, recommendations, and graph analytics.

The big news, however, is the addition of GemFire XD, an in-memory database designed to execute algorithms and analytics on data in real time. Blending elements of Pivotal's GemFire (in-memory object grid) and SQL Fire (in-memory database), GemFire XD puts a SQL-compliant, in-memory database on top of the Hadoop Distributed File System (HDFS), from which it can read data or write data with ultra-low latency.

[Want more on this company's other capabilities? Read Pivotal Launches Cloud App Development Platform.]

GemFire XD could be used by a mobile network provider, for example, to determine the identity, location, device, and network of an incoming call within an instant and then apply complex algorithms or in-memory analytics to determine how to route the call making the best use of available capacity. The database could also handle data-transformation tasks before writing the data to HDFS, circumventing the need for processing that might otherwise be required by way of ETL routines.

The Hadoop community is lately looking to Apache Spark as an open-source option for in-memory and stream processing capabilities, but Pivotal says commercial GemFire XD has many advantages over that technology.

"We're excited about Spark and will support it, but it's generally used for [data] ingest or caching," said Michael Cucchi, Pivotal's senior director of product marketing, in an interview with InformationWeek. "GemFire XD is an ANSI-compliant SQL database with high-availability features, and it can run over wide-area networks, so you can have an instance in Europe and another in North America with replication."

In another database-derived advance in Pivotal HD 2.0, the company has enhanced its HAWQ SQL-on-Hadoop query engine, which is based on the Greenplum database. HAWQ can now apply the more than 50 in-database algorithms in the MADlib Machine Learning Library. What's more the engine now supports automatic translation of R, Python, and Java-based queries and applications so HAWQ can handle business logic and procedures now well handled in SQL.

Pivotal competitors such as Cloudera and Hortonworks slam HAWQ's commercial roots, but here, too, the vendor says its proprietary technology has advantages over Hive, Impala, and other open source SQL-on-Hadoop options.

"HAWQ takes advantage of Greenplum's 10 years of history as a massively parallel processing analytical query engine, so it's 100% SQL compliant, has broad support, and it's extremely high performance compared to [Hive, Impala,] and other options," said Cucchi.

Working on defusing another criticism of HAWQ, Pivotal announced that HD 2.0 introduces beta support for reading and writing of Parquet files from HAWQ. This means the engine will soon support an open file type rather than the Greenplum-specific formatting currently used by the database.

Matching Cloudera's "enterprise data hub" concept, Pivotal has developed a Business Data Lake architecture with HD 2.0 at the center of enterprise data management. But the company is still catching up in some regards in that its proprietary HAWQ and GemFire XD components can't, as yet, be managed by YARN. That's something Pivotal is working on, according to Cucchi, but for now companies will have to use the combination of Pivotal Command Center, Virtual Resource Planner tools, and YARN to separately manage the resources and workloads within a data lake environment.

Pivotal sees its biggest advantage as being its larger Pivotal One Platform, which combines its Spring Source application-development framework and Cloud Foundry platform-as-a-service capabilities as well as the companies data-management capabilities.

"We have hooks from our data-services capabilities so Spring Source developers can make calls from within their environment that will make the data products react," Cucchi explained. "Developers can also spin up hundreds of nodes of Hadoop [on our cloud platform] within minutes, and then with one click, they can attach data services directly to their applications."

That's a much broader play than Pivotal's key Hadoop-distributor competitors try to address, but the question is whether Pivotal can win in all three of the markets in which it competes: application development, cloud infrastructure, and high-scale data management. On that last front, Pivotal now has more than 100 customers running on its Hadoop distribution, with most using HAWQ, according to Cucchi, but he declined to cite recent customer wins.

Cloudera and Hortonworks are generally seen as the leaders of the fast-growing Hadoop market, with Pivotal ranking somewhere after MapR and in the same league as IBM (with BigInsights) in bringing the platform to enterprise customers.

Incidents of mobile malware are way up, researchers say, and 78% of respondents worry about lost or stolen devices. But although many teams are taking mobile security more seriously, 42% still skip scanning completely, and just 39% have MDM systems in place. Find out more in the State Of Mobile Security report (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
3/17/2014 | 6:01:28 PM
Re: Pivotal, if nothing else, is executing fast
If "stolen a march" means it has done a better job of establishing Cloud Foundry than VMware could have done, then that's a good thing. But I would say that the data-management team has not been ahead of competing Hadoop distributions. Hortonworks and Cloudera introduced their distributions around Apache Hadoop 2.2 late last year and early this year, respectively. There's also the issue of HAWQ and GemFire XD not being managed by YARN as yet. Still to come also are elements of object storage from GemFire that aren't yet a part of the (mostly SQL Fire-based) GemFire XD.

If the data-management competition is viewed to be IBM, Oracle, and Teradata, then, yes, you could say Pivotal is executing quickly. I haven't heard a thing about IBM's BigSQL. Oracle is defering all things Hadoop to Cloudera and Teradata is doing pretty much the same with Hortonworks.   
Charlie Babcock
Charlie Babcock,
User Rank: Author
3/17/2014 | 2:24:16 PM
Pivotal, if nothing else, is executing fast
I agree with what you say in your comment, Doug, but I'd have to add that, if nothing else, Pivotal has stolen a march for VMware by establishing Cloud Froundry so convincingly as a PaaS supplier. Now its showing the value of its GemFire and GreenPlum acquistions working with Hadoop. Cloudera and Hortonworks have the Hadoop brain trusts. Pivotal is executing pretty fast for the strange mix of elements in its toolbox.  
D. Henschen
D. Henschen,
User Rank: Author
3/17/2014 | 1:41:41 PM
Which need is driving the selection of Pivotal?
The software-development, cloud-platform, and data-management aspects of the Pivotal One platform all have followers and legacy customer bases, but is the combination of all three really compelling and natural? Proponents would say it provides handoffs and integrations that break down barriers that companies would otherwise have to bridge on their own. Critics see Pivotal as a rag-tag collection of technologies that EMC wanted off its balance sheet. In my view, Pivotal is not addressing one, clear arena, so it will be much harder for this company to take off the way VMware did.
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Remote Work Tops SF, NYC for Most High-Paying Job Openings
Jessica Davis, Senior Editor, Enterprise Apps,  7/20/2021
Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Flash Poll