Big Data // Big Data Analytics
News
4/15/2013
12:14 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Teradata Joins SQL-On-Hadoop Bandwagon

Teradata announces standard SQL access to Hadoop data, following in the footsteps of IBM, EMC and Cloudera. Hadoop is the winner.

Chalk up 2013 as the year of SQL-on-Hadoop announcements. It's also the year that big vendors including IBM, EMC and Teradata effectively acknowledged that Hadoop will become a fixture of the corporate information management landscape.

The latest SQL-on-Hadoop development, announced Monday, is Teradata Enterprise Access for Hadoop, which provides two connection options to the Apache Hadoop open source data processing framework.

The first connection point is Teradata Smart Loader for Hadoop, a new feature of the Teradata Studio graphical interface for database administration, data access and querying. The Smart Loader gives business analysts a point-and-click tool to browse and move data sets from Cloudera or Hortonworks Hadoop clusters over to Teradata databases.

Moving data was possible before with data connectors, but it was a technical exercise that wasn't accessible to business users with point-and-click ease. So the Smart Loader is an advance, but it's not a SQL-on-Hadoop approach as is Teradata SQL-H, the second new connection option announced Monday.

[ What will SQL-on-Hadoop mean for the future? Read Big Data Debate: Will Hadoop Become Dominant Platform? ]

With Teradata SQL-H, users and applications get standard SQL query access to data stored within Hadoop through the Teradata database. The key difference with the Smart Loaders is that you don't have to move that data into the database. You're querying Hadoop data where it lives, and SQL-H also supports in-database analytic processing.

The payoff with SQL-H is blending the structured data in Teradata warehouses and marts with the multi-structured big data in Hadoop clusters. Marketers, for example, will be able to query structured customer data in Teradata as well as clickstreams and social data on Hadoop to develop a better understanding of customer behavior, customer experience or customer preferences and sentiments. Product managers might query test data from log files in Hadoop alongside warranty, component or supplier data in Teradata to get a better understanding of product defects and warranty claim patterns.

The limitation with SQL-H is that it relies on Hortonwork's Apache HCatalog data access project, so it works only with Hortonworks Hadoop deployments. It's not an uncommon limitation, as each and every Hadoop software distributor seems to be coming up with its own answer for SQL querying on Hadoop.

IBM earlier this month announced the technology preview of IBM Big SQL, a query interface that the company plans to offer with its BigInsights Hadoop distribution. Last month EMC announced its Pivotal HD Hadoop distribution, which brings Greenplum database access and querying to data in the Hadoop Distributed File System (HDFS).

Teradata SQL-H will be released by the end of the second quarter. EMC's Pivotal HD release is set for the end of this month. It's unknown when IBM Big SQL will be available. The most anticipated SQL-on-Hadoop release, however, is Cloudera Impala. Impala was announced last fall and is currently in beta release.

Reliable statistics on Hadoop market share aren't available, but there's no doubt that Cloudera's distribution of Hadoop is the most widely deployed. That's why Impala is so anticipated. Competitors MapR and Hortonworks are also working on SQL access to Hadoop with their Drill and Stinger projects, respectively.

Teradata's Enterprise Access for Hadoop options are part of its larger Unified Data Architecture, which also includes the Teradata Aster database. Aster blends SQL and MapReduce-style querying so you can handle time-series analyses, graph analyses, collaborative filtering and other big-data style analyses as well as SQL queries. Where the combination of Teradata and SQL-H will let companies blend data from data warehouses and Hadoop for known analyses, Aster is billed as an exploratory data discovery platform for uncovering latent insights in big data.

The downside of Aster is that it's a separate database that might require data movement from both Teradata and Hadoop. But Teradata insists that Aster lets companies do complex, MapReduce-style, big-data analyses with far fewer expensive, hard-to-find data scientist types.

The theme in all the recent announcements from enterprise stalwarts EMC, IBM and Teradata -- as well as from Oracle and Microsoft -- is that Hadoop is here to stay, so they're finding ways to play well together and make the most of the platform.

When it comes to database deals, customers -- not vendors -- now have the advantage. Find out the results of our new Database Technology Survey. Also in the new, all-digital State Of Database Technology issue of InformationWeek: Oracle has refreshed its midrange and high-end Sparc servers, but that may not help its bottom line. (Free registration required.)

Comment  | 
Print  | 
More Insights
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest Septermber 14, 2014
It doesn't matter whether your e-commerce D-Day is Black Friday, tax day, or some random Thursday when a post goes viral. Your websites need to be ready.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.