HP Taps Vertica For SQL On Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Software Platforms
News
11/17/2014
08:46 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

HP Taps Vertica For SQL On Hadoop

HP brings fast, familiar SQL querying to Hadoop using Vertica database. Here's how it stands out from other big data analysis options.

10 Hadoop Hardware Leaders
10 Hadoop Hardware Leaders
(Click image for larger view and slideshow.)

Add HP to the growing list of vendors offering SQL analysis options on top of the leading big data platform, as the company on Monday announced the general availability of HP Vertica for SQL on Hadoop. 

In the works for months and partially exposed this summer through an earlier Vertica release, HP Vertica for SQL on Hadoop promises what other tools in this class promise: fast and familiar SQL-based querying on top of the increasingly popular big data store. It stands apart in two ways, according to the company. First, it offers more complete SQL functionality than Hadoop-native options such as Cloudera's Impala project, Apache Drill, and IBM Big SQL, HP executives asserted. 

"We have a SQL query engine that's proven and that has a rich set of analytic capabilities," said Steve Sarsfield, product marketing manager in HP's big data business group, in a phone interview with InformationWeek.

[Want more on Hadoop-native SQL? Read Cloudera Boosts Hadoop App Development On Impala.]

SQL capabilities such as joins and merges are often lacking in "immature" Hadoop-native products, according to Sarsfield, and he added that HP customers report that they are "constantly running into bugs and stability issues with some of those products," though he declined to be specific about which products are buggy.

As for other relational databases that have been ported to run on top of Hadoop, such as Pivotal's HAWQ, based on the Greenplum database management system, or the Actian Analytics Platform SQL Hadoop Edition, based on Vectorwise, HP executives claimed that HP Vertica for SQL on Hadoop offers superior scalability and performance.

"Some of the customers that we've announced, like the Facebooks of the world, have done thorough evaluations of all the technologies available, and we get chosen by the largest and most demanding customers," said Jeff Healey, director of product marketing, HP Big Data platform.

HP claims more than 100 customers are working with Vertica 7.1, the summer release that first exposed SQL-on-Hadoop functionality. But only one customer, human resources firm Snagajob, was quoted in HP's press release about those capabilities. 

"With up to 25,000 job postings updates, over 400,000 active postings, and over one million unique visitors on our site every day, there is tremendous potential insight in all that data," said Robert Fehrmann, data architect at Snagajob, in the statement. "HP Vertica for SQL on Hadoop ... gives us an incredibly robust analytics tool to help understand and act on our information assets." 

HP's superiority claims aside, Vertica for SQL on Hadoop attractions include distribution-agnostic compatibility with Apache Hadoop, Cloudera, Hortonworks, or MapR deployments. The release also supports Hadoop-native file formats including Parquet and ORC. And HP says its per-node pricing model is "highly competitive," though it declined to release pricing details.

HP utilities let you manage Vertica's use of nodes, memory, and compute capacity, while the Hadoop cluster is managed with separate tools.
HP utilities let you manage Vertica's use of nodes, memory, and compute capacity, while the Hadoop cluster is managed with separate tools.

Where Hadoop-native SQL-On-Hadoop options like Hive, Impala, and Drill rely on Hadoop 2.0's YARN resource management and Hadoop-native security and data-governance systems, Vertica (like Pivotal HAWQ) does not run on YARN and has its own administrative and security controls. Thus, you'll have to be careful about HP Vertica (or HAWQ) use of cluster compute and memory resources that could impinge on other workloads and service level. [Author's note: This article was corrected to reflect that the Actian Analytics Platform SQL Hadoop Edition is certified to run on YARN.]

"We are aware that YARN is important and that we need to take a look at it in the future, but for now it's colocated with the Hadoop cluster and you use our utilities to set aside nodes, memory, and the resources you need for Vertica analytics," said Healey of HP. 

Microsoft, Oracle, and Teradata have all stopped short of porting their databases to run on top of Hadoop. Instead they've offered Microsoft Polybase, Oracle Big Data SQL, and Teradata Query Grid to blend analysis of Hadoop data with information in database deployments. 

With their SQL-on-Hadoop offerings, HP, Pivotal, Actian, and others are betting that the data lake/data hub concept of using Hadoop as the epicenter of data management will take hold. You could call that aggressive and forward-thinking, but then, HP, Pivotal, Actian, and others are market challengers with far fewer deployments to defend than incumbents such as Oracle, Microsoft, and Teradata. The bet on Hadoop is a bet that disruption will open up opportunities. 

But big data demands more than just SQL analysis, because it involves data that can't be organized into columns and rows. Pivotal, for example, is touting MADlib for machine learning and statistical analysis, while Actian recently added a graph analysis engine. Apache Spark, a fast-growing in-memory analysis engine that runs on top of Hadoop, supports machine learning, streaming analysis, graph analysis, and R analytics as well as SQL querying. 

HP executives said the Vertica community is experimenting with software that will run open source R analytics on the distributed database, but the vendor itself has no public roadmap to productize and support that software. As for working with unstructured data, executives said HP's Autonomy IDOL software offers options including text-, document-, sentiment-, and image-analysis capabilities, though it's not clear how Vertica and IDOL might work together. 

HP Vertica for SQL on Hadoop will clearly be of interest to any Vertica customer. But the real test of success will be its selection and use in place of Hadoop-native SQL-on-Hadoop options or Hadoop-to-database connections offered by the likes of Oracle, Microsoft, and Teradata.

Apply now for the 2015 InformationWeek Elite 100, which recognizes the most innovative users of technology to advance a company's business goals. Winners will be recognized at the InformationWeek Conference, April 27-28, 2015, at the Mandalay Bay in Las Vegas. Application period ends Jan. 16, 2015.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
SteveS503
50%
50%
SteveS503,
User Rank: Apprentice
1/26/2017 | 11:48:40 AM
Re: Column-oriented database on top of Hadoop: How does that work for user?
ORC and Parquet files, popular file formats in the Hadoop world, are columnar. In large part, the hadoop/spark world has adopted columnar. Vertica can use it's own format, ORC or Parquet. You can even do JOINs between the formats with ease.

Deep analysis of LOTS of data is what it's best at doing.  However, you can do single look-ups using optimations provided. So, for example, if you're looking for a single user's information in a sea of electric meter data, there are live aggregate projections and other features that optimize that query.
RobertsPaige
50%
50%
RobertsPaige,
User Rank: Apprentice
11/17/2014 | 5:58:20 PM
Excellent article, slight correction
Hi Doug,

Very nice article on the current SQL on Hadoop offerings. Good representation of what's available. I work at Actian, and just had one minor correction to make. The Actian Analytics Platform - Hadoop SQL Edition is not based on ParAccel. Paraccel has been renamed to Actian Matrix. It is an excellent distributed analytical database, but is not Hadoop-based. It includes on-demand integration with Hadoop for smooth interoperability, but is it's own separate entity.

The Hadoop SQL Edition is based on Actian Vector, formerly Vectorwise, the current holder of several top TPC-H non-cluster benchmark query speed records. We ported the same technology onto Hadoop. We're looking forward to setting a few records in the cluster benchmarks. ;-)

Paige
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
11/17/2014 | 4:08:45 PM
Column-oriented database on top of Hadoop: How does that work for user?
Vertica, last I knew, was strictly a column-oriented SQl system. Aren't there both advantages and drawbacks to that, on top of Hadoop? If you're looking for big picture data, you can get it quickly via columnar access. If you're looking for details in the data, that's a different story.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
11/17/2014 | 9:44:10 AM
IBM surprisingly quiet on big data access and analysis
IBM has offered IBM Big SQL as a native SQL-on-Hadoop option, but I haven't heard much about it other than IBM statements. They're also surprisingly quite on options to synthesize/correlate Hadoop data with what's in DB2 and Netezza data warehouses. Anybody at IBM care to share highlights on SQL analysis of Hadoop data options other than Big SQL?
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
Commentary
If DevOps Is So Awesome, Why Is Your Initiative Failing?
Guest Commentary, Guest Commentary,  12/2/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll