Business intelligence is the top use-case for IT organizations implementing Hadoop, according to a large industry survey last year. Now a new benchmark study shows which Hadoop ecosystem tools are best for particular types of BI queries.
The recently released study's findings offer perspective for IT organizations on a handful of SQL-on-Hadoop engines, such as Hive, Impala, Presto, and Spark SQL. They provide insights on their performance for particular kinds of analytic jobs. The benchmark comes from AtScale, a company that is focused on helping organizations make business intelligence work on Hadoop.
"Different engines continue to perform well for different jobs," said Josh Klahr, VP of products at AtScale, in an interview with InformationWeek. "IT organizations should probably be wary about making a bet on just one engine -- like putting everything on Hive or on Impala."
This new benchmark released this week is the second edition. It provides insight into how the performance of each of these engines has improved since the last report, released 6 months ago.
[Looking for the key big data technologies you need for a successful infrastructure? Read 7 Keys to Building a Successful Big Data Infrastructure.]
IT organizations will want to take note of the results. Business intelligence is the top use-case that enterprises plan for their Hadoop implementations, according to a 2015 Hadoop Maturity Survey of close to 2,100 business, IT, and C-suite executives conducted by AtScale, Tableau, and the three big Hadoop distributors -- Cloudera, Hortonworks, and MapR.
That report found that ETL and data science workloads on Hadoop were decreasing, while business intelligence had gained momentum. That survey also showed that 69% of organizations cited business intelligence as the top use-case, followed by data science at 56%, and ETL at 51%. In 2014, the same survey showed ETL at 74%, data science at 62%, and business intelligence at 65%.
What's more, getting value out of your Hadoop projects may be directly linked to whether your IT organization has enabled business users to query the Hadoop data directly themselves, according to the 2015 Hadoop Maturity Survey. Providing this self-service access to business users unlocks value for organizations.
The survey showed that of the companies that provided self-service options to users, 61% say they are gaining value from Hadoop. Some 41% of the companies that did not provide self-service options say that they see tangible value, and 59% say that they don't see tangible value.
How do you make sure you are adding the right engines to your Hadoop infrastructure to best enable the fast response time your business users expect on their queries? Here is a look at some notable findings from AtScale's tools benchmark test.