Hortonworks wants to make it easier to build big data applications, so on Monday it announced that it will add software and support for the popular Cascading app-development framework to its Hadoop distribution.
Developed and supported by Concurrent, Cascading is a Java-based framework for app development popularized by Internet giants including eBay, LinkedIn, and Twitter, and increasingly used by more conventional enterprises to operationalize big data applications. Where data analysts tend to do interactive, ad-hoc analyses across Hadoop, the Cascading framework is geared to application developers who have to create repeatable big data systems that run day after day. For example, one of Cascading's cable company customers is using Cascading to develop applications based on set-top-box data now analyzed on Hadoop.
"This company brings in 19 terabytes of set-top-box data per day, and they need to build applications that consume that data, process it, and deliver data products to different constituents including marketing and sales," said Gary Nakamura, Concurrent's CEO in a phone interview with InformationWeek.
[Want more on the latest big data breakthroughs? Read MapR Brings Spark In-Memory Analysis To Hadoop.]
Cascading shields developers from the complexities of Hadoop programming, and with recent updates it has been certified by Hortonworks to work with Hadoop 2.0 and its YARN resource management framework. Cascading will also make use of Tez, a new feature of Hadoop 2.0 that eliminates the intermediate writes and delays associated with first-generation MapReduce programming.
"We've gone a lot deeper with Hortonworks with this announcement so that the 6,500-plus deployments that we have of Cascading can migrate from using MapReduce to Apache Tez without any code changes," said Nakamura.
Concurrent's partnership with Hortonworks is non-exclusive, according to Nakamura, but he described Hortonworks as having "open arms to other technologies that help with the broader ecosystem and enterprise adoption of Hadoop." Nakamura didn't elaborate, but one reason for the tighter partnership with Hortonworks might be Cloudera's efforts to go beyond Hive with its Impala offering, which offers an interactive SQL interface for Hadoop. Concurrent offers Cascading Lingual as a SQL-on-Hadoop interface for developers building analytic applications.
With this week's announcement, Hortonworks will ship the Concurrent SDK as part of its Hortonworks Data Platform distribution and it will also offer first- and second-tier support for the software.
IBM, Microsoft, Oracle, and SAP are fighting to become your in-memory technology provider. Do you really need the speed? Get the digital In-Memory Databases issue of InformationWeek today.