Big Data // Big Data Analytics
News
3/17/2014
12:06 PM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Pivotal Brings In-Memory Analysis To Hadoop

Pivotal takes on Cloudera and Hortonworks with GemFire XD, enhanced SQL querying, and new machine-learning options in Pivotal HD 2.0.

8 Data Centers For Cloud's Toughest Jobs
8 Datacenters For Cloud's Toughest Jobs
(Click image for a larger view and slideshow.)

Pivotal, the EMC spin-off company pursuing modern application development in the context of cloud computing and big-data analysis, on Monday released Pivotal HD 2.0, an update of its Hadoop distribution incorporating an in-memory database and a battery of new analysis capabilities.

Pivotal HD 2.0 is the vendor's first distribution based on Apache Hadoop 2.2, the latest release of the open source platform incorporating YARN system resource management controls. The release also integrates and supports Apache GraphLab, an open source framework for derivatives monitoring, recommendations, and graph analytics.

The big news, however, is the addition of GemFire XD, an in-memory database designed to execute algorithms and analytics on data in real time. Blending elements of Pivotal's GemFire (in-memory object grid) and SQL Fire (in-memory database), GemFire XD puts a SQL-compliant, in-memory database on top of the Hadoop Distributed File System (HDFS), from which it can read data or write data with ultra-low latency.

[Want more on this company's other capabilities? Read Pivotal Launches Cloud App Development Platform.]

GemFire XD could be used by a mobile network provider, for example, to determine the identity, location, device, and network of an incoming call within an instant and then apply complex algorithms or in-memory analytics to determine how to route the call making the best use of available capacity. The database could also handle data-transformation tasks before writing the data to HDFS, circumventing the need for processing that might otherwise be required by way of ETL routines.

The Hadoop community is lately looking to Apache Spark as an open-source option for in-memory and stream processing capabilities, but Pivotal says commercial GemFire XD has many advantages over that technology.

"We're excited about Spark and will support it, but it's generally used for [data] ingest or caching," said Michael Cucchi, Pivotal's senior director of product marketing, in an interview with InformationWeek. "GemFire XD is an ANSI-compliant SQL database with high-availability features, and it can run over wide-area networks, so you can have an instance in Europe and another in North America with replication."

In another database-derived advance in Pivotal HD 2.0, the company has enhanced its HAWQ SQL-on-Hadoop query engine, which is based on the Greenplum database. HAWQ can now apply the more than 50 in-database algorithms in the MADlib Machine Learning Library. What's more the engine now supports automatic translation of R, Python, and Java-based queries and applications so HAWQ can handle business logic and procedures now well handled in SQL.

Pivotal competitors such as Cloudera and Hortonworks slam HAWQ's commercial roots, but here, too, the vendor says its proprietary technology has advantages over Hive, Impala, and other open source SQL-on-Hadoop options.

"HAWQ takes advantage of Greenplum's 10 years of history as a massively parallel processing analytical query engine, so it's 100% SQL compliant, has broad support, and it's extremely high performance compared to [Hive, Impala,] and other options," said Cucchi.

Working on defusing another criticism of HAWQ, Pivotal announced that HD 2.0 introduces beta support for reading and writing of Parquet files from HAWQ. This means the engine will soon support an open file type rather than the Greenplum-specific formatting currently used by the database.

Matching Cloudera's "enterprise data hub" concept, Pivotal has developed a Business Data Lake architecture with HD 2.0 at the center of enterprise data management. But the company is still catching up in some regards in that its proprietary HAWQ and GemFire XD components can't, as yet, be managed by YARN. That's something Pivotal is working on, according to Cucchi, but for now companies will have to use the combination of Pivotal Command Center, Virtual Resource Planner tools, and YARN to separately manage the resources and workloads within a data lake environment.

Pivotal sees its biggest advantage as being its larger Pivotal One Platform, which combines its Spring Source application-development framework and Cloud Foundry platform-as-a-service capabilities as well as the companies data-management capabilities.

"We have hooks from our data-services capabilities so Spring Source developers can make calls from within their environment that will make the data products react," Cucchi explained. "Developers can also spin up hundreds of nodes of Hadoop [on our cloud platform] within minutes, and then with one click, they can attach data services directly to their applications."

That's a much broader play than Pivotal's key Hadoop-distributor competitors try to address, but the question is whether Pivotal can win in all three of the markets in which it competes: application development, cloud infrastructure, and high-scale data management. On that last front, Pivotal now has more than 100 customers running on its Hadoop distribution, with most using HAWQ, according to Cucchi, but he declined to cite recent customer wins.

Cloudera and Hortonworks are generally seen as the leaders of the fast-growing Hadoop market, with Pivotal ranking somewhere after MapR and in the same league as IBM (with BigInsights) in bringing the platform to enterprise customers.

Incidents of mobile malware are way up, researchers say, and 78% of respondents worry about lost or stolen devices. But although many teams are taking mobile security more seriously, 42% still skip scanning completely, and just 39% have MDM systems in place. Find out more in the State Of Mobile Security report (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
3/17/2014 | 6:01:28 PM
Re: Pivotal, if nothing else, is executing fast
If "stolen a march" means it has done a better job of establishing Cloud Foundry than VMware could have done, then that's a good thing. But I would say that the data-management team has not been ahead of competing Hadoop distributions. Hortonworks and Cloudera introduced their distributions around Apache Hadoop 2.2 late last year and early this year, respectively. There's also the issue of HAWQ and GemFire XD not being managed by YARN as yet. Still to come also are elements of object storage from GemFire that aren't yet a part of the (mostly SQL Fire-based) GemFire XD.

If the data-management competition is viewed to be IBM, Oracle, and Teradata, then, yes, you could say Pivotal is executing quickly. I haven't heard a thing about IBM's BigSQL. Oracle is defering all things Hadoop to Cloudera and Teradata is doing pretty much the same with Hortonworks.   
mikomatsumura
50%
50%
mikomatsumura,
User Rank: Apprentice
3/17/2014 | 3:44:31 PM
Re: Which need is driving the selection of Pivotal?
Feels like the Big Data use cases for In-Memory Data Grid may be taken care of by combinations of open source solutions like Spark and others. Lots of In-Memory Computing vendors trying to embrace Hadoop, but it feels like the real usage of IMDG is in transactional data processing as well as more real-time analytic processing. At least that's how it is for Hazelcast.

It's tempting for vendors to try to embrance Hadoop and Big Data especially seeing how slow disk-based Hadoop is but over time it seems like the Grid vendors like Gemfire should stay in the transaction and transactional analytics regions where they are strongest.
Charlie Babcock
100%
0%
Charlie Babcock,
User Rank: Author
3/17/2014 | 2:24:16 PM
Pivotal, if nothing else, is executing fast
I agree with what you say in your comment, Doug, but I'd have to add that, if nothing else, Pivotal has stolen a march for VMware by establishing Cloud Froundry so convincingly as a PaaS supplier. Now its showing the value of its GemFire and GreenPlum acquistions working with Hadoop. Cloudera and Hortonworks have the Hadoop brain trusts. Pivotal is executing pretty fast for the strange mix of elements in its toolbox.  
D. Henschen
100%
0%
D. Henschen,
User Rank: Author
3/17/2014 | 1:41:41 PM
Which need is driving the selection of Pivotal?
The software-development, cloud-platform, and data-management aspects of the Pivotal One platform all have followers and legacy customer bases, but is the combination of all three really compelling and natural? Proponents would say it provides handoffs and integrations that break down barriers that companies would otherwise have to bridge on their own. Critics see Pivotal as a rag-tag collection of technologies that EMC wanted off its balance sheet. In my view, Pivotal is not addressing one, clear arena, so it will be much harder for this company to take off the way VMware did.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.