Big Data: A Practical Definition - InformationWeek
Data Management // Big Data Analytics
09:35 AM
Connect Directly

Big Data: A Practical Definition

Today's hazy definitions don't clearly illustrate big data's benefits, says one Hortonworks exec. Here's a pragmatic alternative.

5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)

So what is big data, anyway? Well, there's the classic 3V model -- high volume, high velocity and high variety -- that's bandied about often. But that popular, if nebulous, definition doesn't really explain the pragmatic benefits provide by a big data platform.

David McJannet, VP of marketing for Hortonworks, believes that a more practical description is called for, one that explains the real-world benefits of big data.

"Big data isn't this nebulous thing," McJannet told InformationWeek in a phone interview. "Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking."

Hortonworks, of course, has a very pragmatic reason to teach the world about big data's advantages. As a major driver of the Hadoop ecosystem, the Palo Alto, Calif., enterprise software company has much to gain by persuading organizations to store and analyze massive amounts of data that they might otherwise ignore.

[ Is big data commoditization a real threat? Read Hadoop According To Hortonworks: An Insider's View. ]

So here's an alternative (yet businesslike) definition: Big data is about "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage," said McJannet.

This simpler definition may help businesses move "beyond the more nebulous concept" of big data, he added.

All big data isn't alike, of course, so Hortonwork has subdivided these murky bits into five distinct data categories: social media; server logs; Web clickstream; machine/sensor; and geolocation.

But how can companies use this information?

Take social media data. Businesses are using Facebook, Twitter and similar social sites as "sentiment barometers," McJannet said. A movie producer, for instance, can use this data to track reactions to a new film that's coming out and optimize a marketing campaign based on social media users' comments.

Server logs can help system administrators, who mine this data in Hadoop to identify and react to important issues. McJannet offered this example: "If I track every single inbound request on my website and then overlay that by geography, I can determine better where my biggest customer hotspots are, and where I potentially have security issues," he said.

Clickstream data is an example of how Hadoop provides an affordable way to manage information that would overload a traditional data management system.

"If I could capture clickstream data of all the clicks on my website, obviously that would fill up my database pretty quickly. It's the sheer volume of clickstream data that's generated," said McJannet. "If I store that data in Hadoop ... I can use that information to build a really interesting analytic application."

Machines are a largely untapped source of big data as well.

"Machine (data) is absolutely one of the biggest generators of data, whether (from) air conditioning units, refrigerators, trucks or farm machinery," McJannet said. "It's exploding in terms of the kind of instrumentation that's out there."

With a few billion cellphones in use, mobile devices have enormous potential for data gathering.

"Every time I'm passed from cell tower to cell tower, there's some piece of data that's being generated. It could be used if someone wanted to build an analytic application," noted McJannet.

Geolocation data is fairly new, having been limited to select aerospace and military uses until a decade ago. Today it shows great promise in commercial applications.

A trucking company, for instance, could capture geolocation data every 10 to 60 seconds from every one of its vehicles on the road, and store what could amount to petabytes of data.

"Think about what applications you might be able to build if you track all the geo-data associated with your operations, and how might you be able to drive intelligence out of that," said McJannet.

Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)

Comment  | 
Print  | 
More Insights
Oldest First  |  Newest First  |  Threaded View
User Rank: Strategist
12/30/2013 | 2:45:14 PM
When is a new definition not?
McJannet, Hortonworks VP of marketing, hasn't issued a practical definition of big data so much as described what is most easily talked about as big data: "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage." There are no defininig elements named other than "the latest" effort yielding competitive advantage. Five years from now, this description will fit efforts that look nothing like big data as we know it today. Of course we may have a new name for it, one that emphasizes real time use over mere data size. But this "definition" will still fit.
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll