Add HP to the growing list of vendors offering SQL analysis options on top of the leading big data platform, as the company on Monday announced the general availability of HP Vertica for SQL on Hadoop.
In the works for months and partially exposed this summer through an earlier Vertica release, HP Vertica for SQL on Hadoop promises what other tools in this class promise: fast and familiar SQL-based querying on top of the increasingly popular big data store. It stands apart in two ways, according to the company. First, it offers more complete SQL functionality than Hadoop-native options such as Cloudera's Impala project, Apache Drill, and IBM Big SQL, HP executives asserted.
"We have a SQL query engine that's proven and that has a rich set of analytic capabilities," said Steve Sarsfield, product marketing manager in HP's big data business group, in a phone interview with InformationWeek.
[Want more on Hadoop-native SQL? Read Cloudera Boosts Hadoop App Development On Impala.]
SQL capabilities such as joins and merges are often lacking in "immature" Hadoop-native products, according to Sarsfield, and he added that HP customers report that they are "constantly running into bugs and stability issues with some of those products," though he declined to be specific about which products are buggy.
As for other relational databases that have been ported to run on top of Hadoop, such as Pivotal's HAWQ, based on the Greenplum database management system, or the Actian Analytics Platform SQL Hadoop Edition, based on Vectorwise, HP executives claimed that HP Vertica for SQL on Hadoop offers superior scalability and performance.
"Some of the customers that we've announced, like the Facebooks of the world, have done thorough evaluations of all the technologies available, and we get chosen by the largest and most demanding customers," said Jeff Healey, director of product marketing, HP Big Data platform.
HP claims more than 100 customers are working with Vertica 7.1, the summer release that first exposed SQL-on-Hadoop functionality. But only one customer, human resources firm Snagajob, was quoted in HP's press release about those capabilities.
"With up to 25,000 job postings updates, over 400,000 active postings, and over one million unique visitors on our site every day, there is tremendous potential insight in all that data," said Robert Fehrmann, data architect at Snagajob, in the statement. "HP Vertica for SQL on Hadoop ... gives us an incredibly robust analytics tool to help understand and act on our information assets."
HP's superiority claims aside, Vertica for SQL on Hadoop attractions include distribution-agnostic compatibility with Apache Hadoop, Cloudera, Hortonworks, or MapR deployments. The release also supports Hadoop-native file formats including Parquet and ORC. And HP says its per-node pricing model is "highly competitive," though it declined to release pricing details.
Where Hadoop-native SQL-On-Hadoop options like Hive, Impala, and Drill rely on Hadoop 2.0's YARN resource management and Hadoop-native security and data-governance systems, Vertica (like Pivotal HAWQ) does not run on YARN and has its own administrative and security controls. Thus, you'll have to be careful about HP Vertica (or HAWQ) use of cluster compute and memory resources that could impinge on other workloads and service level. [Author's note: This article was corrected to reflect that the Actian Analytics Platform SQL Hadoop Edition is certified to run on YARN.]
"We are aware that YARN is important and that we need to take a look at it in the future, but for now it's colocated with the Hadoop cluster and you use our utilities to set aside nodes, memory, and the resources you need for Vertica analytics," said Healey of HP.
Microsoft, Oracle, and Teradata have all stopped short of porting their databases to run on top of Hadoop. Instead they've offered Microsoft Polybase, Oracle Big Data SQL, and Teradata Query Grid to blend analysis of Hadoop data with information in database deployments.
With their SQL-on-Hadoop offerings, HP, Pivotal, Actian, and others are betting that the data lake/data hub concept of using Hadoop as the epicenter of data management will take hold. You could call that aggressive and forward-thinking, but then, HP, Pivotal, Actian, and others are market challengers with far fewer deployments to defend than incumbents such as Oracle, Microsoft, and Teradata. The bet on Hadoop is a bet that disruption will open up opportunities.
But big data demands more than just SQL analysis, because it involves data that can't be organized into columns and rows. Pivotal, for example, is touting MADlib for machine learning and statistical analysis, while Actian recently added a graph analysis engine. Apache Spark, a fast-growing in-memory analysis engine that runs on top of Hadoop, supports machine learning, streaming analysis, graph analysis, and R analytics as well as SQL querying.
HP executives said the Vertica community is experimenting with software that will run open source R analytics on the distributed database, but the vendor itself has no public roadmap to productize and support that software. As for working with unstructured data, executives said HP's Autonomy IDOL software offers options including text-, document-, sentiment-, and image-analysis capabilities, though it's not clear how Vertica and IDOL might work together.
HP Vertica for SQL on Hadoop will clearly be of interest to any Vertica customer. But the real test of success will be its selection and use in place of Hadoop-native SQL-on-Hadoop options or Hadoop-to-database connections offered by the likes of Oracle, Microsoft, and Teradata.
Apply now for the 2015 InformationWeek Elite 100, which recognizes the most innovative users of technology to advance a company's business goals. Winners will be recognized at the InformationWeek Conference, April 27-28, 2015, at the Mandalay Bay in Las Vegas. Application period ends Jan. 16, 2015.