Big Data: A Practical Definition - InformationWeek
Data Management // Big Data Analytics
09:35 AM
Connect Directly

Big Data: A Practical Definition

Today's hazy definitions don't clearly illustrate big data's benefits, says one Hortonworks exec. Here's a pragmatic alternative.

5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)

So what is big data, anyway? Well, there's the classic 3V model -- high volume, high velocity and high variety -- that's bandied about often. But that popular, if nebulous, definition doesn't really explain the pragmatic benefits provide by a big data platform.

David McJannet, VP of marketing for Hortonworks, believes that a more practical description is called for, one that explains the real-world benefits of big data.

"Big data isn't this nebulous thing," McJannet told InformationWeek in a phone interview. "Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking."

Hortonworks, of course, has a very pragmatic reason to teach the world about big data's advantages. As a major driver of the Hadoop ecosystem, the Palo Alto, Calif., enterprise software company has much to gain by persuading organizations to store and analyze massive amounts of data that they might otherwise ignore.

[ Is big data commoditization a real threat? Read Hadoop According To Hortonworks: An Insider's View. ]

So here's an alternative (yet businesslike) definition: Big data is about "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage," said McJannet.

This simpler definition may help businesses move "beyond the more nebulous concept" of big data, he added.

All big data isn't alike, of course, so Hortonwork has subdivided these murky bits into five distinct data categories: social media; server logs; Web clickstream; machine/sensor; and geolocation.

But how can companies use this information?

Take social media data. Businesses are using Facebook, Twitter and similar social sites as "sentiment barometers," McJannet said. A movie producer, for instance, can use this data to track reactions to a new film that's coming out and optimize a marketing campaign based on social media users' comments.

Server logs can help system administrators, who mine this data in Hadoop to identify and react to important issues. McJannet offered this example: "If I track every single inbound request on my website and then overlay that by geography, I can determine better where my biggest customer hotspots are, and where I potentially have security issues," he said.

Clickstream data is an example of how Hadoop provides an affordable way to manage information that would overload a traditional data management system.

"If I could capture clickstream data of all the clicks on my website, obviously that would fill up my database pretty quickly. It's the sheer volume of clickstream data that's generated," said McJannet. "If I store that data in Hadoop ... I can use that information to build a really interesting analytic application."

Machines are a largely untapped source of big data as well.

"Machine (data) is absolutely one of the biggest generators of data, whether (from) air conditioning units, refrigerators, trucks or farm machinery," McJannet said. "It's exploding in terms of the kind of instrumentation that's out there."

With a few billion cellphones in use, mobile devices have enormous potential for data gathering.

"Every time I'm passed from cell tower to cell tower, there's some piece of data that's being generated. It could be used if someone wanted to build an analytic application," noted McJannet.

Geolocation data is fairly new, having been limited to select aerospace and military uses until a decade ago. Today it shows great promise in commercial applications.

A trucking company, for instance, could capture geolocation data every 10 to 60 seconds from every one of its vehicles on the road, and store what could amount to petabytes of data.

"Think about what applications you might be able to build if you track all the geo-data associated with your operations, and how might you be able to drive intelligence out of that," said McJannet.

Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Strategist
12/30/2013 | 2:45:14 PM
When is a new definition not?
McJannet, Hortonworks VP of marketing, hasn't issued a practical definition of big data so much as described what is most easily talked about as big data: "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage." There are no defininig elements named other than "the latest" effort yielding competitive advantage. Five years from now, this description will fit efforts that look nothing like big data as we know it today. Of course we may have a new name for it, one that emphasizes real time use over mere data size. But this "definition" will still fit.
Register for InformationWeek Newsletters
White Papers
Current Issue
The Next Generation of IT Support
The workforce is changing as businesses become global and technology erodes geographical and physical barriers.IT organizations are critical to enabling this transition and can utilize next-generation tools and strategies to provide world-class support regardless of location, platform or device
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll