The promise of collecting vast amounts of data -- big data -- and storing it with technologies such as Hadoop, is all about using the data to gain insights. So, unless you can pull out those insights and business intelligence, what's the point? Just because you can collect that data doesn't mean your business analysts can interact with it.
Most business intelligence tools cannot directly query Hadoop.
Startup company AtScale launched last year to solve exactly that problem -- connecting traditional business intelligence tools to Hadoop. And the company has just released a new benchmark for SQL-on-Hadoop, examining at the performance of various Hadoop ecosystem engines when used for specific business intelligence tasks.
[Gartner has rebooted its Magic Quadrant for Business Intelligence and Analytics. Read Gartner BI Magic Quadrant: Inflection Point Has Arrived.]
The benchmark measures performance on big data, performance speed on small data sets, and performance stability when used simultaneously by many users. It looks at how big data technologies Impala, Spark, and Hive perform on each of these tasks. The benchmark concludes that different tasks require different tools.
"One size does not fit all," AtScale co-founder and CEO Dave Mariani told InformationWeek in an interview. "Depending on the raw data size, query complexity, and the target number of end users, one engine can't accomplish it all. Each engine has its own sweet spot."
AtScale's new benchmark builds on a Hadoop Maturity survey the company conducted in conjunction with Cloudera, MapR and Tableau last year, Bruno Aziza, AtScale CMO, told InformationWeek in an interview. The survey found that organizations are moving toward leveraging Hadoop for business intelligence and fewer are pursuing ETL (extract, transform, and load). It also found that organizations are looking to enable self-service, another way of saying that they want to give business analysts the tools they need to get the intelligence out of the big data themselves without relying on help from developers or other gatekeepers.
That's what AtScale is aiming to deliver. And enterprise customers are responding. A year after it's first release, the company counts Comcast, Aetna, and American Express among its customers.
"I wanted to create a data service where business analysts could self-serve the data, but I still had consistency and control over that data," Mariani said.
Mariani was pursuing that mission long before he founded AtScale in 2013.
Between 2009 and 2012 Mariani served in data leadership roles at Yahoo during the period when the search company was incubating Hadoop. At Yahoo, Mariani was in charge of delivering analytics to all of Yahoo's business users including those who managed the advertising business and web properties.
Yahoo co-founders Jerry Yang and David Filo's vision called for retaining all data forever, Mariani told InformationWeek. But that didn't mean it was possible query that data. That's because Hadoop was built to handle any type of file or format while BI tools work with relational data structures. And yet, if you could get these two different systems to talk to each other it could expand the capabilities of business intelligence.
"It was obvious to me that Hadoop was transformational," said Mariani, "But I was never able to connect my business users into all that data we were landing there. I had to do all kinds of crazy things like move data out of Hadoop into expensive relational databases."
It was a time-consuming, roundabout way to solve the problem, and it often defeated the purpose of being able to collect all that data and store it cheaply.
"What I wanted to be able to do was to point those BI users directly to Hadoop and let them operate on the data as it was landing there," Mariani said. He never got there while he was at Yahoo. And he didn't get there after joining social media analytics company Klout as VP of engineering, either. But two years after joining Klout, Mariani was approached by Cloudera and asked to test a new technology they were incubating called Impala, an SQL query engine for Hadoop. A few weeks later, Mariani quit Klout and co-founded AtScale.
"Impala was the missing piece of the equation for me -- an interactive query engine on Hadoop," he said. Impala is just one of the big data technologies that became part of AtScale's efforts. AtScale leverages several big data engines to provide this kind of flexibility to customers including Impala, Spark, Hive, Tez, Presto, and Drill. Different technologies are better suited for particular situations and workloads. The AtScale software can determine the best engine for the customer's workload.
AtScale's software is installed on-premises on an edge node of the Hadoop cluster, providing a way for Hadoop and all the traditional business intelligence tools to talk to each other.
Supported BI platforms include Tableau, Excel, Qlik, BusinessObjects, and MicroStrategy, among others.
"We let any BI visualization tool work directly with the data in Hadoop. When data lands in Hadoop it stays there. We operate on it in place," Mariani said.
Rising stars wanted. Are you an IT professional under age 30 who's making a major contribution to the field? Do you know someone who fits that description? Submit your entry now for InformationWeek's Pearl Award. Full details and a submission form can be found here.Jessica Davis has spent a career covering the intersection of business and technology at titles including IDG's Infoworld, Ziff Davis Enterprise's eWeek and Channel Insider, and Penton Technology's MSPmentor. She's passionate about the practical use of business intelligence, ... View Full Bio