Big Data // Big Data Analytics
Commentary
9/9/2013
11:58 AM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%
Repost This

9 Truths Lead To Big Data's Future

Eight insights get to the heart of big data value and point to a future in which we synthesize and make sense of vast data stores.

Now and then I find myself thinking about the big principles of big data; that is, not about Hadoop vs. relational databases or Mahout vs. Weka, but rather about fundamental wisdom that frames our vision of "the new currency" of data. But maybe the new oil better describes data. Or perhaps we need a new metaphor to explain data's value.

Metaphors aren't factual or provable, but they do illuminate certain truths about topics of interest. They make complex concepts understandable, much like the following set of quotations I've collected that you could say explain basic big-data principles. I'll offer eight truths about big data -- you've surely already bought into at least a few -- ordered roughly chronologically. Then I'll take a look ahead at a "future truth."

1. "Correlation is not causation."

We hear this over and over (or at least I do). I learned one version of the underlying fallacy, when I was in college studying philosophy, as post hoc ergo propter hoc, or "after the thing, therefore because of the thing."

[ Don't be misled. Read 4 Biggest Big Data Myths. ]

You can read a smart take in the O'Reilly Radar blog, where in "The vanishing cost of guessing," Alistair Croll observes: "Overwhelming correlation is what big data does best... Parallel computing, advances in algorithms and the inexorable crawl of Moore's Law have dramatically reduced how much it costs to analyze a data set," creating a "data-driven society [that] is both smarter and dumber." Bottom line? Be smart and respect the difference between correlation and causation. Patterns are not conclusions.

2. "All models are wrong, but some are useful."

Accidental statistician George E.P. Box wrote this in his 1987 textbook, Empirical Model-Building and Response Surfaces. Box developed his thoughts on modeling, which very much apply to big data, over the length of his career. See in particular the article "Science and Statistics," published in the Journal of the American Statistical Association in December 1976.

3. Big data knows (almost) all.

If you don't already, it's time to accept Scott McNealy's 1999 statement, "You have zero privacy anyway... Get over it." McNealy was cofounder and CEO of Sun Microsystems, quoted in Wired magazine. Examples of big data's growing invasiveness are plentiful: Analysts' ability to infer sex and sexual orientation from social postings and pregnancy from buying patterns; the on-going expansion of vast, commercialized consumer-information stores held by Acxiom and the like; the rise of Palantir and Riot-ous information synthesis; the NSA Prism vacuum cleaner.

4. "80% of business-relevant information originates in unstructured form, primarily text, (but also video, images, and audio)."

I wrote this in a 2008 article, although as I said then, this bit of pseudo-data factoid dates back to at least the early 1990s. It's a factoid because it is far too broadly drawn to be precise; as far as I know, it's not derived from any form of systematic measurement ever performed. Still, per statistician Box, "80% unstructured" is a useful notion, even if not precisely correct. Whatever number works for you, text and content analytics belong in your toolkit.

5. "It's not information overload. It's filter failure."

Clay Shirky made this observation at the September 2008 Web 2.0 Expo in New York. Corollaries of Shirky's filter observation are truisms such as, "More data does not imply better insights," which happens to be one I made up. But don't overdo it; avoid what Eli Pariser terms "the filter bubble," an inability to see beyond what automation makes immediate.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.