Hortonworks took the stage at the Hadoop Summit in San Jose, Calif., on Wednesday and delivered the somewhat anti-climactic news that it's releasing a community preview of its Hadoop Data Platform (HDP) 2.0. Everyone knew it was coming and followers understood that the YARN resource management framework is the most important new component.
The less-expected announcement at the Summit was that Teradata will resell and offer support for HDP 2.0 with three new Hadoop deployment options. It's an acknowledgement of Hadoop's growing importance by Teradata, and it will give the still-nascent platform a foot in the door at many of the largest and most data-intensive enterprises in the world that also happen to be Teradata customers.
"Teradata reselling software to run on general-purpose hardware is a first for the company, and it reflects the fact that Hadoop is now a core component of the data architecture at most of the organizations that Teradata and Hortonworks serve," Dave McJannet, Hortonworks' VP of marketing, said in an interview with InformationWeek.
Teradata is also responding to competitive threats from Pivotal, the recent EMC spinoff company that has declared itself to be "all in" on Hadoop, with a Pivotal HD distribution and HAWQ SQL-on-Hadoop approach. And then there's IBM and Oracle, with the former showing signs of getting more aggressive with its Hadoop offerings.
Until recently, Teradata only offered Hadoop as an option on its Teradata Aster Big Analytics Appliance, giving customers the option of mixing Hadoop and Aster database nodes. At the Hadoop Summit, Teradata announced the Teradata Appliance for Hadoop, a Hadoop-only appliance that's set for release in the fourth quarter.
[ Want more on Hadoop's future with the YARN? Read Hadoop 2.0: New Big Data Possibilities. ]
"This is for companies that want a system that's installed and ready to run very quickly and that is entirely backed by a single vendor," Teradata product and services marketing manager Chris Twogood told InformationWeek.
The second deployment option is Teradata Commodity Offering for Hadoop, which is a software distribution that customers can deploy on preconfigured racks of Dell servers specified by Teradata but purchased directly by customers. The third new deployment option is Teradata Software-Only for Hadoop, which is deployed on the customer's choice of hardware.
The new appliance and Dell-based commodity offerings include Viewpoint and SQL-H, Teradata software that improves system reliability, manageability and connectivity to Teradata and other enterprise systems, according to Twogood. Viewpoint is Teradata's systems management software, which has been extended to Hadoop to support cluster management and system health checks. SQL-H is Teradata's SQL-on-Hadoop analysis option, which was jointly developed with Hortonworks.
Why all the attention to Hadoop? Twogood wasn't ready to concede that the "gravity has shifted" to Hadoop and away from conventional data warehouses, as recently suggested by Cloudera. But he did acknowledge that Hadoop has become an "important component" in enterprise data architectures.
Teradata is also ramping up Hadoop consulting services and support options including training, but more on that later.
YARN Opens Doors
As the most conservative distributor of Hadoop software, Hortonworks can't help but be a little behind its competitors in introducing new features. Cloudera started shipping YARN last year (though it didn't recommend using it), while Horton insisted that the software needed enterprise hardening. In fact, Horton is so set on waiting for Apache-sanctified software that HDP 2.0 is being introduced as a community-preview because Hadoop 2.0 software won't be ready until later this summer.
YARN (a slightly-off acronym for Yet Another Resource Manager) is a kind of large-scale, distributed operating system for big data applications. The new YARN architecture will move Hadoop beyond batch-oriented MapReduce processing to support a range of interactive, online and stream processing options. Administrators will use YARN to assign cluster capacity based on the service-level requirements of each application. Recently introduced Apache Storm software, for example, will run on YARN to support real-time event-stream processing applications (like social sentiment analysis). Apache Giraph will run on YARN to support graph analysis for uncovering network relationships (just as Facebook Graph Search does on that network). Spark is an option for high-speed, in-memory analytics on top of Hadoop. MPI is a modeling framework used for assessing risk, optimizing pricing and other advanced analytic applications.
"You'll also see us announce a bunch of developments around HBase reworked to run on YARN," said Arun Murthy, a co-founder of Hortonworks and chairman of the Apache committee overseeing the Hadoop 2.0 release. "That's another example of how YARN will become the absolute core compute framework for the Hadoop ecosystem."
To be effective, business technology pros gather information and interact with peers in a variety of ways. InformationWeek and its parent company, UBM Tech, are looking to discover what information you want and how you like to receive it, as well as your feelings on interactive communities, online content and live events. The results will help our editors develop products and services that best meet your needs. Take this survey and tell us how you like your tech content: Digital, live, opinionated? Tell us and enter to win a 32-GB Google Nexus 7 tablet.