"Storage and networking opportunities are part of the larger megatrend, which is an explosion of data with the 'Internet of Things,'" Ron Bodkin, Think Big's CEO and a co-founder, told InformationWeek. "Hardware and software suppliers to IT departments as well as to industrial companies and to consumers now realize that they can tap their intelligent products to drive improved services and data-driven products."
The joint project that served as the prototype for the Pentaho-Think Big alliance announced Thursday was a project that got underway in early 2011 at Network Appliance. NetApp storage equipment sends data home every day reporting on hardware performance characteristics, but the company wasn't making the most of that information.
"Network Appliance wanted to be able to analyze all of that information on a daily basis and understand, based on the profile of disk performance, where there was potential for disk failures in the field so they could make preemptive service calls," said Eddie White, Pentaho's executive VP of business development.
[ Learn about Pentaho's recent Hadoop deal with Intel. Read 6 Big Data Advances: Some Might Be Giants. ]
The engagement was led by Think Big, which helps companies figure out what they can do with big data by looking at sources and available information, and then designing the architecture and business logic to come up with predictive models and applications. At NetApp, developing the predictive app required a move off of legacy IBM Data Stage ETL software and a shift of the big data volumes off of an Oracle database and into a new Cloudera Hadoop cluster.
"With the volume of data they were handling, an Oracle database was insufficient, and the IBM DataStage software was incapable of moving the data into the cluster," said White.
NetApp's predictive services app went into production last year, and Pentaho's ETL software is used to move day-to-day phone-home data off of Oracle and into Hadoop. Pentaho's reporting and data-visualization software is used for various supporting analyses. Pentaho's software is typically more affordable in big data settings than conventional commercial products, according to ThinkBig's Bodkin.
"A lot of technologies out there are priced for much smaller-scale environments," he said. "ETL for example, is often priced based on data volume or number of machines." By contrast, Pentaho has embedded licensing options with a revenue-share model for OEMs or core and node-based pricing for enterprise deployments.
Most of the analytic big-data apps that Think Big develops combine real-time and batch platforms, according to Bodkin, with Hadoop typically serving as a big data reservoir and NoSQL databases such as Cassandra and MongoDB being the real-time platforms. The analysis often focuses in on the data flowing back and forth among these environments.
Pentaho's BI and analytics tools are designed for relational environments, but Bodkin said the company has been at the forefront of integrating with Hadoop. "More sophisticated users are happy to write SQL, Pig and MapReduce code, but you have to broaden access to a greater range of users," Bodkin said, citing Pentaho's Instaview data visualization tool as a great way for business users to interact with big data.
What about all the SQL-on-Hadoop tools emerging that will enable users to explore data without moving data sets between the Hadoop and relational realms? There's excitement and plenty of beta testing of initiatives such as Cloudera's Impala project, said Bodkin, but those tools aren't in production yet.
The relationship between Think Big and Pentaho is non-exclusive, according to both parties, but White said the pipeline of big-data projects between the two companies just in storage and networking is big enough that Pentaho can't contemplate other new partnerships and initiatives for at least 12 months.