Big Data: A Practical Definition
Today's hazy definitions don't clearly illustrate big data's benefits, says one Hortonworks exec. Here's a pragmatic alternative.
5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments(click image for larger view and for slideshow)
So what is big data, anyway? Well, there's the classic 3V model -- high volume, high velocity and high variety -- that's bandied about often. But that popular, if nebulous, definition doesn't really explain the pragmatic benefits provide by a big data platform.
David McJannet, VP of marketing for Hortonworks, believes that a more practical description is called for, one that explains the real-world benefits of big data.
"Big data isn't this nebulous thing," McJannet told InformationWeek in a phone interview. "Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking."
Hortonworks, of course, has a very pragmatic reason to teach the world about big data's advantages. As a major driver of the Hadoop ecosystem, the Palo Alto, Calif., enterprise software company has much to gain by persuading organizations to store and analyze massive amounts of data that they might otherwise ignore.
[ Is big data commoditization a real threat? Read Hadoop According To Hortonworks: An Insider's View. ]
So here's an alternative (yet businesslike) definition: Big data is about "building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage," said McJannet.
This simpler definition may help businesses move "beyond the more nebulous concept" of big data, he added.
All big data isn't alike, of course, so Hortonwork has subdivided these murky bits into five distinct data categories: social media; server logs; Web clickstream; machine/sensor; and geolocation.
But how can companies use this information?
Take social media data. Businesses are using Facebook, Twitter and similar social sites as "sentiment barometers," McJannet said. A movie producer, for instance, can use this data to track reactions to a new film that's coming out and optimize a marketing campaign based on social media users' comments.
Server logs can help system administrators, who mine this data in Hadoop to identify and react to important issues. McJannet offered this example: "If I track every single inbound request on my website and then overlay that by geography, I can determine better where my biggest customer hotspots are, and where I potentially have security issues," he said.
Clickstream data is an example of how Hadoop provides an affordable way to manage information that would overload a traditional data management system.
"If I could capture clickstream data of all the clicks on my website, obviously that would fill up my database pretty quickly. It's the sheer volume of clickstream data that's generated," said McJannet. "If I store that data in Hadoop ... I can use that information to build a really interesting analytic application."
Machines are a largely untapped source of big data as well.
"Machine (data) is absolutely one of the biggest generators of data, whether (from) air conditioning units, refrigerators, trucks or farm machinery," McJannet said. "It's exploding in terms of the kind of instrumentation that's out there."
With a few billion cellphones in use, mobile devices have enormous potential for data gathering.
"Every time I'm passed from cell tower to cell tower, there's some piece of data that's being generated. It could be used if someone wanted to build an analytic application," noted McJannet.
Geolocation data is fairly new, having been limited to select aerospace and military uses until a decade ago. Today it shows great promise in commercial applications.
A trucking company, for instance, could capture geolocation data every 10 to 60 seconds from every one of its vehicles on the road, and store what could amount to petabytes of data.
"Think about what applications you might be able to build if you track all the geo-data associated with your operations, and how might you be able to drive intelligence out of that," said McJannet.
Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)
About the Author
You May Also Like