Big Data // Big Data Analytics
09:35 AM
Connect Directly
Repost This

Big Data: A Practical Definition

Today's hazy definitions don't clearly illustrate big data's benefits, says one Hortonworks exec. Here's a pragmatic alternative.

5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)

So what is big data, anyway? Well, there's the classic 3V model -- high volume, high velocity and high variety -- that's bandied about often. But that popular, if nebulous, definition doesn't really explain the pragmatic benefits provide by a big data platform.

David McJannet, VP of marketing for Hortonworks, believes that a more practical description is called for, one that explains the real-world benefits of big data.

"Big data isn't this nebulous thing," McJannet told InformationWeek in a phone interview. "Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking."

Hortonworks, of course, has a very pragmatic reason to teach the world about big data's advantages. As a major driver of the Hadoop ecosystem, the Palo Alto, Calif., enterprise software company has much to gain by persuading organizations to store and analyze massive amounts of data that they might otherwise ignore.

[ Is big data commoditization a real threat? Read Hadoop According To Hortonworks: An Insider's View. ]

So here's an alternative (yet businesslike) definition: Big data is about "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage," said McJannet.

This simpler definition may help businesses move "beyond the more nebulous concept" of big data, he added.

All big data isn't alike, of course, so Hortonwork has subdivided these murky bits into five distinct data categories: social media; server logs; Web clickstream; machine/sensor; and geolocation.

But how can companies use this information?

Take social media data. Businesses are using Facebook, Twitter and similar social sites as "sentiment barometers," McJannet said. A movie producer, for instance, can use this data to track reactions to a new film that's coming out and optimize a marketing campaign based on social media users' comments.

Server logs can help system administrators, who mine this data in Hadoop to identify and react to important issues. McJannet offered this example: "If I track every single inbound request on my website and then overlay that by geography, I can determine better where my biggest customer hotspots are, and where I potentially have security issues," he said.

Clickstream data is an example of how Hadoop provides an affordable way to manage information that would overload a traditional data management system.

"If I could capture clickstream data of all the clicks on my website, obviously that would fill up my database pretty quickly. It's the sheer volume of clickstream data that's generated," said McJannet. "If I store that data in Hadoop ... I can use that information to build a really interesting analytic application."

Machines are a largely untapped source of big data as well.

"Machine (data) is absolutely one of the biggest generators of data, whether (from) air conditioning units, refrigerators, trucks or farm machinery," McJannet said. "It's exploding in terms of the kind of instrumentation that's out there."

With a few billion cellphones in use, mobile devices have enormous potential for data gathering.

"Every time I'm passed from cell tower to cell tower, there's some piece of data that's being generated. It could be used if someone wanted to build an analytic application," noted McJannet.

Geolocation data is fairly new, having been limited to select aerospace and military uses until a decade ago. Today it shows great promise in commercial applications.

A trucking company, for instance, could capture geolocation data every 10 to 60 seconds from every one of its vehicles on the road, and store what could amount to petabytes of data.

"Think about what applications you might be able to build if you track all the geo-data associated with your operations, and how might you be able to drive intelligence out of that," said McJannet.

Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Author
12/30/2013 | 2:45:14 PM
When is a new definition not?
McJannet, Hortonworks VP of marketing, hasn't issued a practical definition of big data so much as described what is most easily talked about as big data: "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage." There are no defininig elements named other than "the latest" effort yielding competitive advantage. Five years from now, this description will fit efforts that look nothing like big data as we know it today. Of course we may have a new name for it, one that emphasizes real time use over mere data size. But this "definition" will still fit.
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.