Think of it as a Business Objects Explorer or Tableau Software that runs on top of Apache Hadoop. That's what startup Platfora is promising now that its technology is exiting beta and entering general release.
Backed by top-tier venture capitalist and led by EMC Greenplum veteran Ben Werther, Platfora pitches its platform as a modern-day replacement for the data warehouse. What sets it apart, however, is that it can handle all the scale and variety of data on Hadoop clusters without the delays associated with transforming data to a rigid relational database schema. Platfora's primary competition is the stop-gap approach whereby companies are pulling boiled-down data sets and aggregations from Hadoop and analyzing them on relational databases using conventional BI tools.
"We tend to see companies aggregating to data marts, but we're replacing that model and accelerating the ability to get at all the data rather than just a subset," said Werther, Platfora's founder and CEO.
Platfora's software creates a catalog that enumerates the data sets available on leading Hadoop platforms including Amazon Elastic MapReduce, Cloudera, HortonWorks and MapR (with Intel and ECM Greenplum distribution support coming). A shopping cart-metaphor interface for data analysts is used to pick and choose the dimensions of data to explore. Behind the scenes, Platfora's software generates and executes the MapReduce jobs required to bring all the requested data into a "data lens."
[ Want more on the future of big data analysis? Read What's On Your Big Data Analytics Wish List? ]
Once a data lens is ready -- a process that takes a few hours, according to Platfora -- business users can slice and dice the data and explore data visualizations with sub-second response times because the data lens runs in memory. Adding new data types or changing the dimensions in a data lens takes minutes or hours, says Platfora, not the days or weeks it might take to rebuild a conventional database schema.
Platfora has about 20 beta customers now moving into production, according to Werther. Auto trading site Edmunds.com is using Platfora to examine multi-structured data including clickstreams, car inventory data and sales-lead data to better understand the effectiveness of the site in channeling leads to car dealers. Riot Games is studying interaction with online games such as League of Legends, which has more than 32 million users every month. Platfora has been able keep up with fast-changing usage patterns and data structures as Riot adds new features to its games.
Opower, a utility analytics company that reads smart-meter data and helps customers reduce electricity consumption, is rolling out Platfora as a potential replacement for a current approach in which aggregates are drawn from Hadoop, batch-loaded onto a columnar database and analyzed with conventional BI tools. Opower would rather skip the process of moving data from Hadoop over to a SQL database, according to Drew Hylbert, Opower's director of infrastructure engineering.
"If you have multiple systems, you end up scaling one before the other and you get into coordination efforts, so I'm all for putting everything on the same data resources," Hylbert recently told InformationWeek (see What's On Your Big Data Analytics Wish List?), making the case for Hadoop as a single platform for data storage and analysis.
Platfora intends to work with SQL-on-Hadoop interfaces now in the works, including Cloudera Impala, Hortonworks Stinger, the MapR-led Apache Drill initiative and EMC Greenplum's Pivotal distribution, Werther said. "You see how broad the problem [of data analysis] is based on the interest in SQL interfaces on Hadoop, so we'll use those interfaces as accelerators when they're available."
Competitors to Platfora include Datameer, Karmasphere and Hadapt, which Werther dismissed as "first-generation" tools that "aren't geared to business users." That's not how Datameer describes its spreadsheet-style interface, which has won adoption precisely because of its familiar analysis approach. Karmasphere and Hadapt, meanwhile, have enviable partnerships and a big head start in building out a presence in the data-analysis market building up around Hadoop.
Companies want more than they're getting today from big data analytics. But small and big vendors are working to solve the key problems. Also in the new, all-digital Analytics Wish List issue of InformationWeek: Jay Parikh, the Facebook's infrastructure VP, discusses the company's big data plans. (Free registration required.)