Big Data: A Practical Definition - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
8/26/2013
09:35 AM
Connect Directly
Google+
RSS
E-Mail
50%
50%

Big Data: A Practical Definition

Today's hazy definitions don't clearly illustrate big data's benefits, says one Hortonworks exec. Here's a pragmatic alternative.

5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)

So what is big data, anyway? Well, there's the classic 3V model -- high volume, high velocity and high variety -- that's bandied about often. But that popular, if nebulous, definition doesn't really explain the pragmatic benefits provide by a big data platform.

David McJannet, VP of marketing for Hortonworks, believes that a more practical description is called for, one that explains the real-world benefits of big data.

"Big data isn't this nebulous thing," McJannet told InformationWeek in a phone interview. "Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking."

Hortonworks, of course, has a very pragmatic reason to teach the world about big data's advantages. As a major driver of the Hadoop ecosystem, the Palo Alto, Calif., enterprise software company has much to gain by persuading organizations to store and analyze massive amounts of data that they might otherwise ignore.

[ Is big data commoditization a real threat? Read Hadoop According To Hortonworks: An Insider's View. ]

So here's an alternative (yet businesslike) definition: Big data is about "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage," said McJannet.

This simpler definition may help businesses move "beyond the more nebulous concept" of big data, he added.

All big data isn't alike, of course, so Hortonwork has subdivided these murky bits into five distinct data categories: social media; server logs; Web clickstream; machine/sensor; and geolocation.

But how can companies use this information?

Take social media data. Businesses are using Facebook, Twitter and similar social sites as "sentiment barometers," McJannet said. A movie producer, for instance, can use this data to track reactions to a new film that's coming out and optimize a marketing campaign based on social media users' comments.

Server logs can help system administrators, who mine this data in Hadoop to identify and react to important issues. McJannet offered this example: "If I track every single inbound request on my website and then overlay that by geography, I can determine better where my biggest customer hotspots are, and where I potentially have security issues," he said.

Clickstream data is an example of how Hadoop provides an affordable way to manage information that would overload a traditional data management system.

"If I could capture clickstream data of all the clicks on my website, obviously that would fill up my database pretty quickly. It's the sheer volume of clickstream data that's generated," said McJannet. "If I store that data in Hadoop ... I can use that information to build a really interesting analytic application."

Machines are a largely untapped source of big data as well.

"Machine (data) is absolutely one of the biggest generators of data, whether (from) air conditioning units, refrigerators, trucks or farm machinery," McJannet said. "It's exploding in terms of the kind of instrumentation that's out there."

With a few billion cellphones in use, mobile devices have enormous potential for data gathering.

"Every time I'm passed from cell tower to cell tower, there's some piece of data that's being generated. It could be used if someone wanted to build an analytic application," noted McJannet.

Geolocation data is fairly new, having been limited to select aerospace and military uses until a decade ago. Today it shows great promise in commercial applications.

A trucking company, for instance, could capture geolocation data every 10 to 60 seconds from every one of its vehicles on the road, and store what could amount to petabytes of data.

"Think about what applications you might be able to build if you track all the geo-data associated with your operations, and how might you be able to drive intelligence out of that," said McJannet.

Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
12/30/2013 | 2:45:14 PM
When is a new definition not?
McJannet, Hortonworks VP of marketing, hasn't issued a practical definition of big data so much as described what is most easily talked about as big data: "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage." There are no defininig elements named other than "the latest" effort yielding competitive advantage. Five years from now, this description will fit efforts that look nothing like big data as we know it today. Of course we may have a new name for it, one that emphasizes real time use over mere data size. But this "definition" will still fit.
Slideshows
9 Steps Toward Ethical AI
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/15/2019
Commentary
How to Assess Digital Transformation Efforts
Lisa Morgan, Freelance Writer,  5/14/2019
Commentary
Is AutoML the Answer to the Data Science Skills Shortage?
Guest Commentary, Guest Commentary,  5/10/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
A New World of IT Management in 2019
This IT Trend Report highlights how several years of developments in technology and business strategies have led to a subsequent wave of changes in the role of an IT organization, how CIOs and other IT leaders approach management, in addition to the jobs of many IT professionals up and down the org chart.
Slideshows
Flash Poll