I've heard at least a couple of speakers suggest that "Big Data" is badly named. After all it's the information you get from the data that matters. It should be "Big Analytics," or "Big Information," right? And, is it just lots of data, or velocity of data, or type of data, or all of the above? Right now, it's pretty much anything that pushes the limit of "typical" tools (meaning mainly relational databases). The truth is that it means everything to everyone, enabling the use of the buzziest buzzwords!
Internet service providers (Google, Facebook, etc.,) will tell you that Big Data is the enormous volume of unstructured data (and the customer information within it) that has driven the evolution of tools like Hadoop. But the actual bandwidth service providers are also swamped with well-defined, structured IP data records needing real-time analysis in order to manage data flow, track usage and anticipate bottlenecks. This begs the question of whether "real time" analytics is really "Big Analytics?" Then there are the retailers who want to track your Web and in-store purchases while matching them to all of your personal data from Facebook et al.
But the moniker "Big Data" clearly puts the focus on the amount or type of data versus its analysis. It suggests more data is better. But is it? There have been suggestions in healthcare that we will gain significantly from just analyzing clinical data and outcomes (a.k.a. Evidence based medicine) long before the need for "Big Data" will exist.
I strongly support that it's what you do with the data that matters. A measurable outcome (or some sort of ROI) would be ideal. That's what it's going to take to get more people serious about "Big Data."
Asking IT whether they have Big Data plans, rather than asking businesses what information could improve their business performance, is clearly putting the cart before the horse. Likewise our focus on the size and type of the data, rather than information and utility, suggests a fear of statistical analysis. It's ironic that Hollywood can make a great movie on Baseball's use of Sabermetrics, yet techies are more focused on unstructured data, than the correlations or models that can be produced. "It's amazing that Moneyball makes baseball statistics seem fascinating" - as the movie's review says.
However it is coming full circle. I just got an invite to a "BIG DATA BUSINESS FORUM" at which the featured speaker is none other than Billy Bean, the General Manager of the Oakland A's and the subject of Moneyball.
I might actually make the trip to hear Billy as he had a clear outcome in mind when he became an advocate for statistical analysis. By the way, the A's have a shot this year!
Yes, it's exciting that Hadoop provides a cost effective scale out solution to amassing vast quantities of data. It will be even more exciting when sophisticated analytics and predictive modeling tools are easy to use and widely available in open source.
Here's your opportunity for a free Discover pass:
Head over to The Server Room Facebook Page and answer this simple question:
What's the next leap for big data?
Cloud; Small Business; Mobile, or Nowhere: Big Data Will Stay with the Big Enterprise - Or something else? Share with us what you think!
You'll need to be a US citizen, over 18 years old, and are responsible for your own travel/lodging costs should you be picked as one of the lucky respondents.
Pauline is a 25+ year industry server veteran. Currently Pauline is General Manager for Enterprise Software Strategy at Intel. Previously Pauline was Sr. Vice President of Product Development and Product Management for Penguin Computing. Pauline has an MBA from Clark University, was part of Yale's Executive Management Program, and has a Bachelor's of Science in Mathematics from University of Pittsburgh.
Pauline Nist is a general manager in Intel's Datacenter and Connected Systems Group. You can reach her on Twitter @panist
The above insights were provided to InformationWeek by Intel Corporation as part of a sponsored content program. The information and opinions expressed in this content are those of Intel Corporation and its partners and not InformationWeek or its parent, UBM TechWeb.