Pentaho, DataStax Build Ties To Ease NoSQL Data Movement

Open-source Kettle data integration software provides a drag-and-drop approach to getting information into and out of Cassandra.
12 Hadoop Vendors To Watch In 2012
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
Apache open-source software distributors Pentaho and DataStax announced Tuesday that they have integrated their software to simplify the task of getting data into and out of the Cassandra NoSQL database.

The key to the new partnership is Apache Kettle, the open-source data integration and extract, transform, load (ETL) software that Pentaho distributes. With hooks added to get data into and out of Apache Cassandra, the fast-growing NoSQL database that DataStax distributes, developers will no longer have to create custom scripts and routines to move data into and out of the database--a routine, no-coding-required task in the relational database world.

"This lowers the barriers, in terms of the skill level required, for people to use Cassandra," said Richard Daley, co-founder and chief strategy officer at Pentaho, in an interview with InformationWeek.

Cassandra is used primarily used as a high-scale transactional (OLTP) database. Like other NoSQL databases, Cassandra does not impose a predefined schema, so new data types can be flexibly added at will. Kettle provides real-time feed and batch-oriented data integration and job orchestration capabilities that will give developers ready-made tools for drawing data out of legacy systems and moving it into Cassandra.

[ Want more on NoSQL? Read Amazon DynamoDB: Big Data's Big Cloud Moment. ]

"Until now, we've not had a very good answer for getting current source data into Cassandra," said Robin Schumacher, vice president of products at DataStax, in an interview with InformationWeek. "Kettle gives us a visual, drag-and-drop environment that lets people point-and-click their way through that process."

As a NoSQL product, Cassandra doesn't use tables or joins; instead it employs column families that might have hundreds or even thousands of component columns. When monitoring, reporting, or analytic needs arise, Kettle ETL capabilities can pull structured information out of Cassandra (with supporting workflow and scheduling tools) to meet these business intelligence (BI) needs.

"People often need to transform data from Cassandra and create materialized views and summarizations that are used in the relational world," Schumacher explained, adding that Kettle will also let users create these data sets in a visual manner.

Pentaho and DataStax forged their partnership in hopes that open-source ties between Kettle and Cassandra will promote their respective commercial products. Pentaho's free Community Edition includes Kettle as well as basic BI reporting, charting, and query capabilities; Pentaho Enterprise adds advanced data visualization, data discovery, and dashboarding capabilities, as well as commercial support for all related open-source and commercial software.

The free DataStax Community Edition includes Apache Cassandra plus DataStax's OpsCenter system management and monitoring software. DataStax Enterprise adds commercial support, a more robust version of OpsCenter with deeper management capabilities, plus the DataStax Enterprise Server, which supports both Cassandra and Apache Hadoop on a single platform. Users typically bring their real-time , transactional data into the Cassandra nodes of the platform and that information is automatically replicated on separate Hadoop nodes used for analytics.

"Having Cassandra and Hadoop on a single platform eliminates workload competition because you don't have to move data between platforms," Schumacher explained.

The Pentaho-DataStax partnership is another step in the maturation of a leading NoSQL alternative. Cassandra, Hadoop, and other alternatives can't go mainstream until there are plenty of ready-made alternatives for basics like data movement.

It's time to get going on data center automation. The cloud requires automation, and it'll free resources for other priorities. Download InformationWeek's Data Center Automation special supplement now. (Free registration required.)