The three V's -- volume, velocity and variety -- do a fine job of defining big data. Don't be misled by the "wanna-V's:" variability, veracity, validity and value.
IBM sees veracity as a fourth big data V. (Like me, IBM doesn't advocate variability, validity, or value as big data essentials.) Regarding veracity, IBM asks, "How can you act upon information if you don't trust it?"
Yet facts, whether captured in natural language or in a structured database, are not always true. False or outdated data may nonetheless be useful, non-factual subjective data (feelings and opinions) too.
Consider two statements, one asserting a fact and the other containing one that is no longer true. Join me in concluding that data may contain value unlinked from veracity:
-- "The Iraqi regime... possesses and produces chemical and biological weapons." -- George W. Bush, October 7, 2002.
-- "I am glad that George Bush is President." -- Daniel Pinchbeck, writing ironically, June, 2003.
Veracity does matter. I'll cite an old Russian proverb: "Trust, but verify." That is, analyze your data -- evaluate it in context, taking into account provenance -- in order to understand it and use it appropriately.
3 V's Versus 'Wanna-V's'
My aim here is to differentiate the essence of big data, as defined by Doug Laney's original-and-still-valid 3 V's, from the derived qualities of new Vs proposed by various vendors, pundits and gurus. My hope is to maintain clarity and stave off market-confusing fragmentation begotten by the wanna-V's.
On one side of the divide we have data capture and storage; on the other, business-goal oriented filtering, analysis and presentation. Databases and data streaming technologies answer the big data need; for the balance, the smart stuff, you need big data analytics.
Variability, veracity, validity and value aren't intrinsic, definitional big data properties. They are not absolutes. By contrast, they reflect the uses you intend for your data. They relate to your particular business needs.
You discover context-dependent variability, veracity, validity and value in your data via analyses that assess and reduce data and present insights in forms that facilitate business decision-making. This function -- analytics -- is the key to understanding big data.
Seth Grimes is the leading industry analyst covering text analytics and sentiment analysis. He founded Washington-based Alta Plana Corporation technology strategy consultancy, in 1997.
Items from pills to power plants will soon generate billions of data points. How will this movement change your industry? Also in the new, all-digital Here Comes The Internet Of Things issue of InformationWeek: How IT can capitalize on the NSA's big data prowess. (Free registration required.)
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.