Chalk up 2013 as the year of SQL-on-Hadoop announcements. It's also the year that big vendors including IBM, EMC and Teradata effectively acknowledged that Hadoop will become a fixture of the corporate information management landscape.
The latest SQL-on-Hadoop development, announced Monday, is Teradata Enterprise Access for Hadoop, which provides two connection options to the Apache Hadoop open source data processing framework.
The first connection point is Teradata Smart Loader for Hadoop, a new feature of the Teradata Studio graphical interface for database administration, data access and querying. The Smart Loader gives business analysts a point-and-click tool to browse and move data sets from Cloudera or Hortonworks Hadoop clusters over to Teradata databases.
Moving data was possible before with data connectors, but it was a technical exercise that wasn't accessible to business users with point-and-click ease. So the Smart Loader is an advance, but it's not a SQL-on-Hadoop approach as is Teradata SQL-H, the second new connection option announced Monday.
[ What will SQL-on-Hadoop mean for the future? Read Big Data Debate: Will Hadoop Become Dominant Platform? ]
With Teradata SQL-H, users and applications get standard SQL query access to data stored within Hadoop through the Teradata database. The key difference with the Smart Loaders is that you don't have to move that data into the database. You're querying Hadoop data where it lives, and SQL-H also supports in-database analytic processing.
The payoff with SQL-H is blending the structured data in Teradata warehouses and marts with the multi-structured big data in Hadoop clusters. Marketers, for example, will be able to query structured customer data in Teradata as well as clickstreams and social data on Hadoop to develop a better understanding of customer behavior, customer experience or customer preferences and sentiments. Product managers might query test data from log files in Hadoop alongside warranty, component or supplier data in Teradata to get a better understanding of product defects and warranty claim patterns.
The limitation with SQL-H is that it relies on Hortonwork's Apache HCatalog data access project, so it works only with Hortonworks Hadoop deployments. It's not an uncommon limitation, as each and every Hadoop software distributor seems to be coming up with its own answer for SQL querying on Hadoop.
IBM earlier this month announced the technology preview of IBM Big SQL, a query interface that the company plans to offer with its BigInsights Hadoop distribution. Last month EMC announced its Pivotal HD Hadoop distribution, which brings Greenplum database access and querying to data in the Hadoop Distributed File System (HDFS).
Teradata SQL-H will be released by the end of the second quarter. EMC's Pivotal HD release is set for the end of this month. It's unknown when IBM Big SQL will be available. The most anticipated SQL-on-Hadoop release, however, is Cloudera Impala. Impala was announced last fall and is currently in beta release.
Reliable statistics on Hadoop market share aren't available, but there's no doubt that Cloudera's distribution of Hadoop is the most widely deployed. That's why Impala is so anticipated. Competitors MapR and Hortonworks are also working on SQL access to Hadoop with their Drill and Stinger projects, respectively.
Teradata's Enterprise Access for Hadoop options are part of its larger Unified Data Architecture, which also includes the Teradata Aster database. Aster blends SQL and MapReduce-style querying so you can handle time-series analyses, graph analyses, collaborative filtering and other big-data style analyses as well as SQL queries. Where the combination of Teradata and SQL-H will let companies blend data from data warehouses and Hadoop for known analyses, Aster is billed as an exploratory data discovery platform for uncovering latent insights in big data.
The downside of Aster is that it's a separate database that might require data movement from both Teradata and Hadoop. But Teradata insists that Aster lets companies do complex, MapReduce-style, big-data analyses with far fewer expensive, hard-to-find data scientist types.
The theme in all the recent announcements from enterprise stalwarts EMC, IBM and Teradata -- as well as from Oracle and Microsoft -- is that Hadoop is here to stay, so they're finding ways to play well together and make the most of the platform.
When it comes to database deals, customers -- not vendors -- now have the advantage. Find out the results of our new Database Technology Survey. Also in the new, all-digital State Of Database Technology issue of InformationWeek: Oracle has refreshed its midrange and high-end Sparc servers, but that may not help its bottom line. (Free registration required.)