Machine Learning & AI

Pivotal Brings In-Memory Analysis To Hadoop

Pivotal takes on Cloudera and Hortonworks with GemFire XD, enhanced SQL querying, and new machine-learning options in Pivotal HD 2.0.

Doug Henschen, Executive Editor, Enterprise Apps

March 17, 2014

5 Min Read

8 Data Centers For Cloud's Toughest Jobs

8 Datacenters For Cloud's Toughest Jobs (Click image for a larger view and slideshow.)

Pivotal, the EMC spin-off company pursuing modern application development in the context of cloud computing and big-data analysis, on Monday released Pivotal HD 2.0, an update of its Hadoop distribution incorporating an in-memory database and a battery of new analysis capabilities.

Pivotal HD 2.0 is the vendor's first distribution based on Apache Hadoop 2.2, the latest release of the open source platform incorporating YARN system resource management controls. The release also integrates and supports Apache GraphLab, an open source framework for derivatives monitoring, recommendations, and graph analytics.

The big news, however, is the addition of GemFire XD, an in-memory database designed to execute algorithms and analytics on data in real time. Blending elements of Pivotal's GemFire (in-memory object grid) and SQL Fire (in-memory database), GemFire XD puts a SQL-compliant, in-memory database on top of the Hadoop Distributed File System (HDFS), from which it can read data or write data with ultra-low latency.

[Want more on this company's other capabilities? Read Pivotal Launches Cloud App Development Platform.]

GemFire XD could be used by a mobile network provider, for example, to determine the identity, location, device, and network of an incoming call within an instant and then apply complex algorithms or in-memory analytics to determine how to route the call making the best use of available capacity. The database could also handle data-transformation tasks before writing the data to HDFS, circumventing the need for processing that might otherwise be required by way of ETL routines.

The Hadoop community is lately looking to Apache Spark as an open-source option for in-memory and stream processing capabilities, but Pivotal says commercial GemFire XD has many advantages over that technology.

"We're excited about Spark and will support it, but it's generally used for [data] ingest or caching," said Michael Cucchi, Pivotal's senior director of product marketing, in an interview with InformationWeek. "GemFire XD is an ANSI-compliant SQL database with high-availability features, and it can run over wide-area networks, so you can have an instance in Europe and another in North America with replication."

In another database-derived advance in Pivotal HD 2.0, the company has enhanced its HAWQ SQL-on-Hadoop query engine, which is based on the Greenplum database. HAWQ can now apply the more than 50 in-database algorithms in the MADlib Machine Learning Library. What's more the engine now supports automatic translation of R, Python, and Java-based queries and applications so HAWQ can handle business logic and procedures now well handled in SQL.

Pivotal competitors such as Cloudera and Hortonworks slam HAWQ's commercial roots, but here, too, the vendor says its proprietary technology has advantages over Hive, Impala, and other open source SQL-on-Hadoop options.

"HAWQ takes advantage of Greenplum's 10 years of history as a massively parallel processing analytical query engine, so it's 100% SQL compliant, has broad support, and it's extremely high performance compared to [Hive, Impala,] and other options," said Cucchi.

Working on defusing another criticism of HAWQ, Pivotal announced that HD 2.0 introduces beta support for reading and writing of Parquet files from HAWQ. This means the engine will soon support an open file type rather than the Greenplum-specific formatting currently used by the database.

Matching Cloudera's "enterprise data hub" concept, Pivotal has developed a Business Data Lake architecture with HD 2.0 at the center of enterprise data management. But the company is still catching up in some regards in that its proprietary HAWQ and GemFire XD components can't, as yet, be managed by YARN. That's something Pivotal is working on, according to Cucchi, but for now companies will have to use the combination of Pivotal Command Center, Virtual Resource Planner tools, and YARN to separately manage the resources and workloads within a data lake environment.

Pivotal sees its biggest advantage as being its larger Pivotal One Platform, which combines its Spring Source application-development framework and Cloud Foundry platform-as-a-service capabilities as well as the companies data-management capabilities.

"We have hooks from our data-services capabilities so Spring Source developers can make calls from within their environment that will make the data products react," Cucchi explained. "Developers can also spin up hundreds of nodes of Hadoop [on our cloud platform] within minutes, and then with one click, they can attach data services directly to their applications."

That's a much broader play than Pivotal's key Hadoop-distributor competitors try to address, but the question is whether Pivotal can win in all three of the markets in which it competes: application development, cloud infrastructure, and high-scale data management. On that last front, Pivotal now has more than 100 customers running on its Hadoop distribution, with most using HAWQ, according to Cucchi, but he declined to cite recent customer wins.

Cloudera and Hortonworks are generally seen as the leaders of the fast-growing Hadoop market, with Pivotal ranking somewhere after MapR and in the same league as IBM (with BigInsights) in bringing the platform to enterprise customers.

Incidents of mobile malware are way up, researchers say, and 78% of respondents worry about lost or stolen devices. But although many teams are taking mobile security more seriously, 42% still skip scanning completely, and just 39% have MDM systems in place. Find out more in the State Of Mobile Security report (free registration required).

About the Author(s)

Doug Henschen

Executive Editor, Enterprise Apps

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of Transform Magazine, and Executive Editor at DM News. He has covered IT and data-driven marketing for more than 15 years.

See more from Doug Henschen

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Pivotal Brings In-Memory Analysis To Hadoop

About the Author(s)

Editor's Choice

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

<span class="ArticleBase-LargeTitle">Pivotal Brings In-Memory Analysis To Hadoop</span>Pivotal Brings In-Memory Analysis To Hadoop

About the Author(s)

Editor's Choice

Pivotal Brings In-Memory Analysis To Hadoop