Big Data. Big Decisions
InformationWeek
Special Coverage Series

Commentary

Patrick Houston

Patrick Houston



Why Big Is Bad When It Comes To Data

Calling it "big data" doesn't do it justice. Gushing data would be far more accurate.

Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)
Too bad the IT terms we coin stick like a price tag to a cheap trinket. Once they're on, you can't claw them off. Or when you do, they leave that ugly residue.

Take "big data." It's the catchphrase du jour. You hear it everywhere. The tech media, including InformationWeek, covers it thoroughly. Database and analytics vendors are glomming on to it for the cachet it gives their marketing efforts. I had to grin when SAS CEO Jim Goodnight, a wizened figure if ever there was one, properly scoffed in a recent interview with InformationWeek's Doug Henschen that "we're talking about big data now because everyone got tired of talking about the cloud."

There's nothing inherently wrong with being a new thing. Trouble is the term is just so imprecise. What's it say when the generally authoritative Wikipedia describes "big data" right off the bat as a "loosely defined term"?

Lately, my meanderings have taken me into a number of encounters with some of the best minds dealing with "big data," including researchers from Intel and MIT, hands-on executive managers at companies such as LinkedIn, eBay, and Adobe, and entrepreneurs such as Ash Damle of MEDgle.

And the more I bump into the topic of "big data" the more concerned I've become about the term itself. Reason: It falls so far short of not only describing the phenomenon, but also its applications, opportunities, and ramifications--for IT, business, the way we live and work, too.

[ Entrepreneurship has a strong pull for many of our best and brightest. Is The Corporate Brain Drain Inevitable? ]

Unless you're a computer science PhD or a database professional, it's easy to take the term literally. And among those who do, don't forget, are the corporate execs and line-of-business managers with whom even those of you in the know must deal. To them, "big" is just about the amount. It's not difficult to imagine the petabytes piling up out there, given the contrail of information everyone exhausts as they move across the various fixed and mobile networks.

Of course, volume is the most immediate issue many of you face in dealing with your data. At a big data panel held at Google's Silicon Valley HQ last week, the participants addressed at length the costs of warehousing, and along two dimensions--size and duration. It's not just how much data you want to process and store but for how long. And they also raised the issue of diminishing returns. When do the costs of keeping and sifting over time outweigh practical benefit?

Global CIO
Global CIOs: A Site Just For You
Visit InformationWeek's Global CIO -- our online community and information resource for CIOs operating in the global economy.
Even as quantity remains a crucial concern, the forefront of big data will be increasingly defined by two other "V" words--velocity and variety, a point well made by Michael Stonebraker, an MIT electrical engineering and computer science professor specializing in database research, who patiently tutored me in the cutting-edge of big data in terms a non-expert could grasp.

Data isn't static, like standing waters of a reservoir. It's increasingly dynamic, generated and collected in real time. Even transactional data is being captured at both ends--and at every point in between. Ergo, data gushes.

And it gushes from an expanding number of sources, including all the sensors monitoring more and more of what we do. One of my favorite examples comes from Eve M. Schooler, an Intel R&D principal, who pointed out that public utility smart meters in many municipalities now report energy usage every 15 minutes--frequently enough to discern any number of behavioral patterns, such as when you're home (or not), alone or with others. And that's just one silent stream.

Those three "V's"--volume, velocity, and variety--go back a ways, of course. Gartner market analyst Doug Laney used them to describe big data as far back as 2001. But it doesn't hurt to revive aged, but still valid, thinking if only because "big data," properly defined, will present a multitude of challenges to many of you reading this, and soon enough.

One is analytics. MIT's Stonebraker contends that the "simple analytics" that data warehouses can apply to relational databases just aren't up to the complex, covariant calculations required to tap the probabilities and predictive insights--the real gold--within the gushing streams of unstructured data spouting up everywhere.

To make his point about the limitations of relational databases and the simple analytics applied to them, Stonebraker cites one pharmaceutical company trying to mine the data being captured by its 8,000 research scientists, each with an individual electronic Web notebook. Imagine the payoff, he suggests, in finding a groundbreaking new drug out of probabilistic connections between one researcher's works seemingly so far from another's in distance and subject matter. While there are informatics systems capable of integrating 10 data sources, there are none that can choke down thousands, Stonebraker said. "Hell will freeze over before you get it done," he said.

Finally, describing data as nothing more than "big" makes it seem too benign. There ought to be an adjective that at least hints to the grave implications to privacy lurking ahead as companies, governments, and heaven-knows-who-else become ever more adept at collecting, storing, processing, analyzing, and visualizing data.

As you might expect, Stonebraker foresees momentous economic and social value. At the same time, he also sees the dark side. "Privacy is going to be a huge issue," he said. "And it's largely going to be a political issue, too."

So if "big" doesn't cut it as an appropriate modifier, then what does? Maybe we should start an effort to call it something else, before the term is popularized beyond any redemption whatsover.

"Gush data" anyone?

Patrick Houston is the co-founder of MediaArchitechs. He is a former SVP for a new media startup, a GM at Yahoo, and editor-in-chief at CNET.com. He can be reached at patrick.houston@mediaarchitechs.com.



Related Reading


More Insights




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.