Why Big Is Bad When It Comes To Data - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
12:30 PM
Patrick Houston
Patrick Houston
Connect Directly

Why Big Is Bad When It Comes To Data

Calling it "big data" doesn't do it justice. Gushing data would be far more accurate.

Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)
Too bad the IT terms we coin stick like a price tag to a cheap trinket. Once they're on, you can't claw them off. Or when you do, they leave that ugly residue.

Take "big data." It's the catchphrase du jour. You hear it everywhere. The tech media, including InformationWeek, covers it thoroughly. Database and analytics vendors are glomming on to it for the cachet it gives their marketing efforts. I had to grin when SAS CEO Jim Goodnight, a wizened figure if ever there was one, properly scoffed in a recent interview with InformationWeek's Doug Henschen that "we're talking about big data now because everyone got tired of talking about the cloud."

There's nothing inherently wrong with being a new thing. Trouble is the term is just so imprecise. What's it say when the generally authoritative Wikipedia describes "big data" right off the bat as a "loosely defined term"?

Lately, my meanderings have taken me into a number of encounters with some of the best minds dealing with "big data," including researchers from Intel and MIT, hands-on executive managers at companies such as LinkedIn, eBay, and Adobe, and entrepreneurs such as Ash Damle of MEDgle.

And the more I bump into the topic of "big data" the more concerned I've become about the term itself. Reason: It falls so far short of not only describing the phenomenon, but also its applications, opportunities, and ramifications--for IT, business, the way we live and work, too.

[ Entrepreneurship has a strong pull for many of our best and brightest. Is The Corporate Brain Drain Inevitable? ]

Unless you're a computer science PhD or a database professional, it's easy to take the term literally. And among those who do, don't forget, are the corporate execs and line-of-business managers with whom even those of you in the know must deal. To them, "big" is just about the amount. It's not difficult to imagine the petabytes piling up out there, given the contrail of information everyone exhausts as they move across the various fixed and mobile networks.

Of course, volume is the most immediate issue many of you face in dealing with your data. At a big data panel held at Google's Silicon Valley HQ last week, the participants addressed at length the costs of warehousing, and along two dimensions--size and duration. It's not just how much data you want to process and store but for how long. And they also raised the issue of diminishing returns. When do the costs of keeping and sifting over time outweigh practical benefit?

Global CIO
Global CIOs: A Site Just For You
Visit InformationWeek's Global CIO -- our online community and information resource for CIOs operating in the global economy.
Even as quantity remains a crucial concern, the forefront of big data will be increasingly defined by two other "V" words--velocity and variety, a point well made by Michael Stonebraker, an MIT electrical engineering and computer science professor specializing in database research, who patiently tutored me in the cutting-edge of big data in terms a non-expert could grasp.

Data isn't static, like standing waters of a reservoir. It's increasingly dynamic, generated and collected in real time. Even transactional data is being captured at both ends--and at every point in between. Ergo, data gushes.

And it gushes from an expanding number of sources, including all the sensors monitoring more and more of what we do. One of my favorite examples comes from Eve M. Schooler, an Intel R&D principal, who pointed out that public utility smart meters in many municipalities now report energy usage every 15 minutes--frequently enough to discern any number of behavioral patterns, such as when you're home (or not), alone or with others. And that's just one silent stream.

Those three "V's"--volume, velocity, and variety--go back a ways, of course. Gartner market analyst Doug Laney used them to describe big data as far back as 2001. But it doesn't hurt to revive aged, but still valid, thinking if only because "big data," properly defined, will present a multitude of challenges to many of you reading this, and soon enough.

One is analytics. MIT's Stonebraker contends that the "simple analytics" that data warehouses can apply to relational databases just aren't up to the complex, covariant calculations required to tap the probabilities and predictive insights--the real gold--within the gushing streams of unstructured data spouting up everywhere.

To make his point about the limitations of relational databases and the simple analytics applied to them, Stonebraker cites one pharmaceutical company trying to mine the data being captured by its 8,000 research scientists, each with an individual electronic Web notebook. Imagine the payoff, he suggests, in finding a groundbreaking new drug out of probabilistic connections between one researcher's works seemingly so far from another's in distance and subject matter. While there are informatics systems capable of integrating 10 data sources, there are none that can choke down thousands, Stonebraker said. "Hell will freeze over before you get it done," he said.

Finally, describing data as nothing more than "big" makes it seem too benign. There ought to be an adjective that at least hints to the grave implications to privacy lurking ahead as companies, governments, and heaven-knows-who-else become ever more adept at collecting, storing, processing, analyzing, and visualizing data.

As you might expect, Stonebraker foresees momentous economic and social value. At the same time, he also sees the dark side. "Privacy is going to be a huge issue," he said. "And it's largely going to be a political issue, too."

So if "big" doesn't cut it as an appropriate modifier, then what does? Maybe we should start an effort to call it something else, before the term is popularized beyond any redemption whatsover.

"Gush data" anyone?

Patrick Houston is the co-founder of MediaArchitechs. He is a former SVP for a new media startup, a GM at Yahoo, and editor-in-chief at CNET.com. He can be reached at [email protected]

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Doug Laney
Doug Laney,
User Rank: Apprentice
8/20/2012 | 5:07:32 AM
re: Why Big Is Bad When It Comes To Data
Thanks for the citation Patrick. Great piece. For your readers interested in the original piece first mentioning the "3Vs" from over 11 years ago I have been able to unearth and post it on my Gartner blog: http://blogs.gartner.com/doug-.... Enjoy. Note that Gartner has also updated its definition of Big Data to incorporate a few new critical concepts such as the need for new, innovative forms of processing, and representative use cases. --Doug Laney, VP Research, Gartner, @doug_laney
User Rank: Apprentice
8/16/2012 | 8:57:36 PM
re: Why Big Is Bad When It Comes To Data
Actually, I think we almost called it "Cloud...the sequel". This article hits the nail on the head on so many levels. At SAS, we've seen that alot of companies canG«÷t take full advantage of "big data" because, as you mention, the term has become all encompassing and so broad that no one knows what it is. We recently launched an informational page about G«£big dataG«• that mays shed some light on what the term now means in the analytics industry. The page provides information on G«£What big data actually meansG«• with definitions, customer examples, etc. Really, we're trying to provide basic information relevant to those who are just starting to research big data. If youG«÷re interested, check it out at http://www.sas.com/big-data
User Rank: Apprentice
7/30/2012 | 10:45:43 PM
re: Why Big Is Bad When It Comes To Data
Hello, Patrick. This is E.G.Nadhan, Distinguished Technologist, HP Enterprise Services.

Thank you for providing an insightful perspective on the challenges of using the term Big Data and the inherent risks involved with its multi-faceted interpretations.

I also like your suggestion about "gushing data" being a more representative term -- both from its underlying meaning as well as the way it sounds when you just say the word - gushing!

However, your article got me thinking about the reasons why the term has caught on and is spreading like wild fire. In this post, I have outlined my thoughts on the five different reasons why this is so - http://bit.ly/QrPn73.

Please check out my post http://bit.ly/QrPn73 at your convenience and let me know what you think.

Twitter: @NadhanAtHP
User Rank: Apprentice
7/20/2012 | 5:58:44 PM
re: Why Big Is Bad When It Comes To Data
Good piece. I have seen a 4th "V" that adds to the description/discussion of Big (or maybe Swarming Data?) Data: Value. Understanding when data will not be useful anymore and can be deleted will become more and more important. What to keep, for how long, and where to put it are topics I haven't seen covered really in the big data hype cycle.
10 Cyberattacks on the Rise During the Pandemic
Cynthia Harvey, Freelance Journalist, InformationWeek,  6/24/2020
IT Trade Shows Go Virtual: Your 2020 List of Events
Jessica Davis, Senior Editor, Enterprise Apps,  5/29/2020
Study: Cloud Migration Gaining Momentum
John Edwards, Technology Journalist & Author,  6/22/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Flash Poll