Splunk Spawns Hunk Hadoop Tool

Splunk brings its analytic know-how to the multi-structured data living in HDFS with Hunk, a new stand-alone system aimed at Hadoop users.

Doug Henschen, Executive Editor, Enterprise Apps

June 26, 2013

4 Min Read

Hunk is the catchy code name for Splunk Analytics for Hadoop, a new beta product introduced by Splunk on Wednesday. As the formal name suggests, the product brings the company's machine data analytics capabilities to data residing in Hadoop.

Splunk made its name (and hundreds of millions of dollars in an IPO) gaining insight from machine data. It does so with an ad hoc query language expressly designed to make sense of the highly variable data streaming out of server log files, sensors and other machine data sources that bring complexity to data centers.

Splunk's existing customers -- more than 5,600 -- use the proprietary, high-scale back end of Splunk Enterprise to store data, but with many companies now dumping all their data into Hadoop clusters, it only made sense to bring Splunk's analytic capabilities to Hadoop.

"Companies are trying to extract value from Hadoop, but the work is quite low-level and technical, and it takes lots of services and highly specialized resources to do the work," Sanjay Mehta, Splunk's VP of product marketing, told InformationWeek. "Hunk gives them an easy way to interact with and get value out of that data."

[ Want more on Splunk-based analytics? Read Splunk Answers Business Demand For Big Data Analysis. ]

Splunk caught on as a tool for IT departments to track operational problems in high-scale systems such as e-commerce sites. But customers like Expedia that initially used Splunk to keep Web sites up and running are also answering business-relevant questions such as how many inquiries and searches are we getting, and is our traffic coming from unpaid search, advertisements or keyword buys?

Hunk makes sense out of massive data stores on Hadoop first by applying a Splunk Virtual Index that provides metadata. The index supports the same Splunk Search Processing Language used in the company's Splunk Enterprise product. Users can then explore, detect patterns and anomalies and drill down on terabyte- and petabyte-scale Hadoop clusters.

Users can also uncover correlations with structured data using Splunk DB Connect to link data in relational databases to an analysis. Hunk also has reporting, data visualization and dashboarding tool, so you can turn valuable correlations and reports into always-on, production analyses.

Hunk offers query acceleration, stored statistics, scheduling and access-control features that aren't purpose-built in Hadoop. And while it's certainly possible to code analyses from scratch in Hadoop, Splunk says Hunk offers a shortcut around the hard work of inventing and coding each and every inquiry.

"Splunk is a command-based search language with more than 100 technical commands, and it's designed explicitly for this kind of data," said Mehta. "Whether you're a data architect, a data scientist or a data analyst, we make it easier to analyze data without having to work at the low level of MapReduce and HDFS."

Hunk will be priced and packaged separately from the company's standard, Splunk Enterprise product. Pricing has yet to established, as the general release isn't expected until year end, but it's likely to be on a per-node basis.

Will companies still have a reason to buy Splunk Enterprise when they can use Hunk to exploit Hadoop as a general-purpose big-data repository? That decision will come down to an economic analysis of what it costs to do analyses with Splunk Enterprise versus what it costs to do them with Hunk on top of Hadoop, according to Gartner big data analyst Merv Adrian.

"These data streams aren't necessarily being put in HDFS today, and they may not go there in the future unless the customer has made a clear investment in Hadoop and they have the cluster set up and they want to throw these analyses in there, too," Adrian told InformationWeek.

With SQL options like Cloudera Impala and improvements to the Hive query interface also in the works, it's clear that the analytic possibilities on top of Hadoop are only going to get richer.

To be effective, business technology pros gather information and interact with peers in a variety of ways. InformationWeek and its parent company, UBM Tech, are looking to discover what information you want and how you like to receive it, as well as your feelings on interactive communities, online content and live events. The results will help our editors develop products and services that best meet your needs. Take this survey and tell us how you like your tech content: Digital, live, opinionated? Tell us and enter to win a 32-GB Google Nexus 7 tablet.

About the Author(s)

Doug Henschen

Executive Editor, Enterprise Apps

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of Transform Magazine, and Executive Editor at DM News. He has covered IT and data-driven marketing for more than 15 years.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights