Actian and HP Vertica separately challenge Cloudera Impala, follow Pivotal in adapting their databases to run on the big data platform.
10 Big Data Pros To Follow On Twitter
(Click image for larger view and slideshow.)
Actian on Tuesday joined the long list of companies that have introduced a way to support SQL access and querying on top of Hadoop. The announcement comes just a week after HP upgraded SQL-on-Hadoop functionality it introduced late last year through its Vertica database.
Actian and HP join Pivotal (with Greenplum-based HAWQ) and InfiniDB among companies extending existing relational database management systems to run on top of Hadoop's HDFS file system. Actian said it's going after Hadoop market-share leader Cloudera and its Impala offering, which was introduced last year as a faster, more SQL-compliant alternative to Hive.
The Actian Analytics Platform Hadoop SQL Edition, due out by the end of this month, beats Impala with even faster querying and ISO SQL 92 compliance, according to Actian CTO Mike Hoskins.
"We're offering full-functioning, SQL-complete functionality running natively on Hadoop, and we're also the highest-performing SQL database running on Hadoop," Hoskins told InformationWeek in a phone interview. "If you add those two together, we have an advantage that's hugely important for customers looking to empower their SQL users."
Actian internal research claims faster querying than Cloudera Impala.
Actian has acquired and consolidated into its Actian Analytics Platform technologies including the ParAccel and Vectorwise databases and Pervasive DataRush data-integration software. The new SQL-on-Hadoop option uses what's now called the Vector engine for parallelized querying on HDFS. Actian's testing shows its query performance will be as much as 30 times faster than Impala, Hoskins said.
HP introduced SQL-on-Hadoop capabilities on its columnar Vertica database late last year by eliminating its proprietary storage layer so it could work with Hadoop-native file formats including JSON, Parquet, Thrift, and others. In last week's release, dubbed Dragline, HP eliminated all separation between Hadoop and Vertica clusters.
"That means Vertica can coexist with the Hadoop cluster, and we can access and query against HDFS data leaving it where it is," said Eamon O'Neill, HP's Vertica product manager in a phone interview with InformationWeek. Vertica is also capable of doing SQL queries against semi-structured data including clickstreams and Web session data, according to O'Neil.
Actian's architecture does not require a separate cluster, but it appears to be a step behind HP in that it has to load new data or convert existing data inside Hadoop into its proprietary database storage format to support SQL querying. Actian says support for Hadoop-native file formats are on the roadmap for a future release.
There's more to the Actian and HP announcements. Actian, for example, boasts 200 connectors to enterprise data systems and YARN-certified data processing and ETL on top of Hadoop. HP enhanced Vertica with live aggregate lookups for enhanced customer personalization analysis, sentiment analysis against short text streams such as Twitter tweets, and improved workload-management features. But the big news for both companies is clearly SQL-on-Hadoop support.
Despite the profusion of options for using SQL against big data, Hive remains the most widely used query tool with Hadoop. On that front Hortonworks says the latest generation of Hive offers greatly improved performance. Nonetheless, Hive and Impala both fall short of relational databases in SQL functionality, according to Forrester analyst Mike Gualtieri.
"Vendors have obsessed about performance, but the question is, can you run the queries you need to run?" Gualtieri told InformationWeek. "Impala still has work to do, but Actian, Pivotal, and Vertica are far more likely to support the queries that companies already have in use."
IBM, Microsoft, Oracle, and SAP are fighting to become your in-memory technology provider. Do you really need the speed? Get the digital In-Memory Databases issue of InformationWeek today.
Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business wonít wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.