Software // Information Management
News
4/4/2013
12:07 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%
Repost This

Pentaho, ThinkBig Tackle Internet Of Things Analysis

Analytics trendsetter Think Big uses Pentaho open-source tools to create predictive models using Hadoop and NoSQL databases.

 Big Data Analytics Masters Degrees: 20 Top Programs
Big Data Analytics Masters Degrees: 20 Top Programs
(click image for larger view and for slideshow)
Storage vendors and network equipment manufacturers are the low-hanging fruit for a budding partnership between open-source data management and business intelligence software supplier Pentaho and boutique big-data consultancy Think Big Analytics. But that's just the beginning.

"Storage and networking opportunities are part of the larger megatrend, which is an explosion of data with the 'Internet of Things,'" Ron Bodkin, Think Big's CEO and a co-founder, told InformationWeek. "Hardware and software suppliers to IT departments as well as to industrial companies and to consumers now realize that they can tap their intelligent products to drive improved services and data-driven products."

The joint project that served as the prototype for the Pentaho-Think Big alliance announced Thursday was a project that got underway in early 2011 at Network Appliance. NetApp storage equipment sends data home every day reporting on hardware performance characteristics, but the company wasn't making the most of that information.

"Network Appliance wanted to be able to analyze all of that information on a daily basis and understand, based on the profile of disk performance, where there was potential for disk failures in the field so they could make preemptive service calls," said Eddie White, Pentaho's executive VP of business development.

[ Learn about Pentaho's recent Hadoop deal with Intel. Read 6 Big Data Advances: Some Might Be Giants. ]

The engagement was led by Think Big, which helps companies figure out what they can do with big data by looking at sources and available information, and then designing the architecture and business logic to come up with predictive models and applications. At NetApp, developing the predictive app required a move off of legacy IBM Data Stage ETL software and a shift of the big data volumes off of an Oracle database and into a new Cloudera Hadoop cluster.

"With the volume of data they were handling, an Oracle database was insufficient, and the IBM DataStage software was incapable of moving the data into the cluster," said White.

NetApp's predictive services app went into production last year, and Pentaho's ETL software is used to move day-to-day phone-home data off of Oracle and into Hadoop. Pentaho's reporting and data-visualization software is used for various supporting analyses. Pentaho's software is typically more affordable in big data settings than conventional commercial products, according to ThinkBig's Bodkin.

"A lot of technologies out there are priced for much smaller-scale environments," he said. "ETL for example, is often priced based on data volume or number of machines." By contrast, Pentaho has embedded licensing options with a revenue-share model for OEMs or core and node-based pricing for enterprise deployments.

Most of the analytic big-data apps that Think Big develops combine real-time and batch platforms, according to Bodkin, with Hadoop typically serving as a big data reservoir and NoSQL databases such as Cassandra and MongoDB being the real-time platforms. The analysis often focuses in on the data flowing back and forth among these environments.

Pentaho's BI and analytics tools are designed for relational environments, but Bodkin said the company has been at the forefront of integrating with Hadoop. "More sophisticated users are happy to write SQL, Pig and MapReduce code, but you have to broaden access to a greater range of users," Bodkin said, citing Pentaho's Instaview data visualization tool as a great way for business users to interact with big data.

What about all the SQL-on-Hadoop tools emerging that will enable users to explore data without moving data sets between the Hadoop and relational realms? There's excitement and plenty of beta testing of initiatives such as Cloudera's Impala project, said Bodkin, but those tools aren't in production yet.

The relationship between Think Big and Pentaho is non-exclusive, according to both parties, but White said the pipeline of big-data projects between the two companies just in storage and networking is big enough that Pentaho can't contemplate other new partnerships and initiatives for at least 12 months.

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
Protecting Critical Infrastructure: A New Approach NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.