Pentaho, DataStax Build Ties To Ease NoSQL Data Movement - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
News
2/28/2012
11:18 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Pentaho, DataStax Build Ties To Ease NoSQL Data Movement

Open-source Kettle data integration software provides a drag-and-drop approach to getting information into and out of Cassandra.

12 Hadoop Vendors To Watch In 2012
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
Apache open-source software distributors Pentaho and DataStax announced Tuesday that they have integrated their software to simplify the task of getting data into and out of the Cassandra NoSQL database.

The key to the new partnership is Apache Kettle, the open-source data integration and extract, transform, load (ETL) software that Pentaho distributes. With hooks added to get data into and out of Apache Cassandra, the fast-growing NoSQL database that DataStax distributes, developers will no longer have to create custom scripts and routines to move data into and out of the database--a routine, no-coding-required task in the relational database world.

"This lowers the barriers, in terms of the skill level required, for people to use Cassandra," said Richard Daley, co-founder and chief strategy officer at Pentaho, in an interview with InformationWeek.

Cassandra is used primarily used as a high-scale transactional (OLTP) database. Like other NoSQL databases, Cassandra does not impose a predefined schema, so new data types can be flexibly added at will. Kettle provides real-time feed and batch-oriented data integration and job orchestration capabilities that will give developers ready-made tools for drawing data out of legacy systems and moving it into Cassandra.

[ Want more on NoSQL? Read Amazon DynamoDB: Big Data's Big Cloud Moment. ]

"Until now, we've not had a very good answer for getting current source data into Cassandra," said Robin Schumacher, vice president of products at DataStax, in an interview with InformationWeek. "Kettle gives us a visual, drag-and-drop environment that lets people point-and-click their way through that process."

As a NoSQL product, Cassandra doesn't use tables or joins; instead it employs column families that might have hundreds or even thousands of component columns. When monitoring, reporting, or analytic needs arise, Kettle ETL capabilities can pull structured information out of Cassandra (with supporting workflow and scheduling tools) to meet these business intelligence (BI) needs.

"People often need to transform data from Cassandra and create materialized views and summarizations that are used in the relational world," Schumacher explained, adding that Kettle will also let users create these data sets in a visual manner.

Pentaho and DataStax forged their partnership in hopes that open-source ties between Kettle and Cassandra will promote their respective commercial products. Pentaho's free Community Edition includes Kettle as well as basic BI reporting, charting, and query capabilities; Pentaho Enterprise adds advanced data visualization, data discovery, and dashboarding capabilities, as well as commercial support for all related open-source and commercial software.

The free DataStax Community Edition includes Apache Cassandra plus DataStax's OpsCenter system management and monitoring software. DataStax Enterprise adds commercial support, a more robust version of OpsCenter with deeper management capabilities, plus the DataStax Enterprise Server, which supports both Cassandra and Apache Hadoop on a single platform. Users typically bring their real-time , transactional data into the Cassandra nodes of the platform and that information is automatically replicated on separate Hadoop nodes used for analytics.

"Having Cassandra and Hadoop on a single platform eliminates workload competition because you don't have to move data between platforms," Schumacher explained.

The Pentaho-DataStax partnership is another step in the maturation of a leading NoSQL alternative. Cassandra, Hadoop, and other alternatives can't go mainstream until there are plenty of ready-made alternatives for basics like data movement.

It's time to get going on data center automation. The cloud requires automation, and it'll free resources for other priorities. Download InformationWeek's Data Center Automation special supplement now. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
Enterprise Guide to Edge Computing
Cathleen Gagne, Managing Editor, InformationWeek,  10/15/2019
News
Rethinking IT: Tech Investments that Drive Business Growth
Jessica Davis, Senior Editor, Enterprise Apps,  10/3/2019
Slideshows
IT Careers: 12 Job Skills in Demand for 2020
Cynthia Harvey, Freelance Journalist, InformationWeek,  10/1/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll