Big Data // Big Data Analytics
News
6/26/2013
10:22 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%
Repost This

Splunk Spawns Hunk Hadoop Tool

Splunk brings its analytic know-how to the multi-structured data living in HDFS with Hunk, a new stand-alone system aimed at Hadoop users.

Hunk is the catchy code name for Splunk Analytics for Hadoop, a new beta product introduced by Splunk on Wednesday. As the formal name suggests, the product brings the company's machine data analytics capabilities to data residing in Hadoop.

Splunk made its name (and hundreds of millions of dollars in an IPO) gaining insight from machine data. It does so with an ad hoc query language expressly designed to make sense of the highly variable data streaming out of server log files, sensors and other machine data sources that bring complexity to data centers.

Splunk's existing customers -- more than 5,600 -- use the proprietary, high-scale back end of Splunk Enterprise to store data, but with many companies now dumping all their data into Hadoop clusters, it only made sense to bring Splunk's analytic capabilities to Hadoop.

"Companies are trying to extract value from Hadoop, but the work is quite low-level and technical, and it takes lots of services and highly specialized resources to do the work," Sanjay Mehta, Splunk's VP of product marketing, told InformationWeek. "Hunk gives them an easy way to interact with and get value out of that data."

[ Want more on Splunk-based analytics? Read Splunk Answers Business Demand For Big Data Analysis. ]

Splunk caught on as a tool for IT departments to track operational problems in high-scale systems such as e-commerce sites. But customers like Expedia that initially used Splunk to keep Web sites up and running are also answering business-relevant questions such as how many inquiries and searches are we getting, and is our traffic coming from unpaid search, advertisements or keyword buys?

Hunk makes sense out of massive data stores on Hadoop first by applying a Splunk Virtual Index that provides metadata. The index supports the same Splunk Search Processing Language used in the company's Splunk Enterprise product. Users can then explore, detect patterns and anomalies and drill down on terabyte- and petabyte-scale Hadoop clusters.

Users can also uncover correlations with structured data using Splunk DB Connect to link data in relational databases to an analysis. Hunk also has reporting, data visualization and dashboarding tool, so you can turn valuable correlations and reports into always-on, production analyses.

Hunk offers query acceleration, stored statistics, scheduling and access-control features that aren't purpose-built in Hadoop. And while it's certainly possible to code analyses from scratch in Hadoop, Splunk says Hunk offers a shortcut around the hard work of inventing and coding each and every inquiry.

"Splunk is a command-based search language with more than 100 technical commands, and it's designed explicitly for this kind of data," said Mehta. "Whether you're a data architect, a data scientist or a data analyst, we make it easier to analyze data without having to work at the low level of MapReduce and HDFS."

Hunk will be priced and packaged separately from the company's standard, Splunk Enterprise product. Pricing has yet to established, as the general release isn't expected until year end, but it's likely to be on a per-node basis.

Will companies still have a reason to buy Splunk Enterprise when they can use Hunk to exploit Hadoop as a general-purpose big-data repository? That decision will come down to an economic analysis of what it costs to do analyses with Splunk Enterprise versus what it costs to do them with Hunk on top of Hadoop, according to Gartner big data analyst Merv Adrian.

"These data streams aren't necessarily being put in HDFS today, and they may not go there in the future unless the customer has made a clear investment in Hadoop and they have the cluster set up and they want to throw these analyses in there, too," Adrian told InformationWeek.

With SQL options like Cloudera Impala and improvements to the Hive query interface also in the works, it's clear that the analytic possibilities on top of Hadoop are only going to get richer.

To be effective, business technology pros gather information and interact with peers in a variety of ways. InformationWeek and its parent company, UBM Tech, are looking to discover what information you want and how you like to receive it, as well as your feelings on interactive communities, online content and live events. The results will help our editors develop products and services that best meet your needs. Take this survey and tell us how you like your tech content: Digital, live, opinionated? Tell us and enter to win a 32-GB Google Nexus 7 tablet.

Comment  | 
Print  | 
More Insights
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.