Government // Enterprise Architecture
Commentary
1/5/2012
04:46 PM
Commentary
Commentary
Commentary
Connect Directly
RSS
E-Mail
50%
50%

Big Data Plans: You Need More Than One

Stop trying to make last year's solution fit next year's problems. Develop multiple plans for your big data.

Rarely does one catch term ignite an entire market, but in the world of IT, Big Data is it. But big data has a thousand definitions, rendering the term effectively meaningless, so allow me to bring the hype back to earth.

Simply put, big data applies to any dataset that breaks the boundaries and conventional capabilities of IT. Big data's defining characteristic could be scale--capacity is the easiest thing to get your brain around. Sheer volume of content can blow up your data center's existing capabilities. It could be the amount of transactions you need to do.

Big data is really a cause. A new approach to dealing with it is the effect, which is what's important. The effect will change everything.

History And Confusion

Big data is often equated to analytics, and while analytics is one use case, it's by no means the only one. However, it's a good place to start to understand how we got here. In short, we start with the concept of "My Data"--the data from a person, for example.

My Enterprise Strategy Group colleague Julie Lockner created a Structured Data Reference Model that tracks the life of My Data, which makes it easier to understand how something small ends up so very large. In this model, data that's created lives within a transaction processing system. While this model may vary from organization to organization and application to application, generally speaking, four data lifecycles are initiated when data is created: transaction processing, reporting and analytics, backup or disaster recovery, and application testing and development.

Data, created once, is replicated to these four functions, just within the domain of the transaction processing system. The first level of analytics exists within the transaction processing system itself (completed transactions, failures, etc.). The data is then prepared, processed, transformed, and replicated outside of the transaction processing system to be housed inside a data warehousing system, where one may perform analytics on a group of My Data records, looking for sales based on geographies, for example. That data warehouse also will require data protection and disaster recovery functions, and other copies will be required for test/development.

Then, all the My Data objects are transformed, processed, and replicated to a "Big Analytics" system, where it's pored over for shopping cart dropout rates and other cause/effect scenarios. Again, copies of copies are used for test/development, backup, and DR.

Wow. It doesn't take long to see how one little transaction record can grow 100-fold. Sooner or later, that growth will break the capabilities of conventional IT.

To steal a line from Julie: "More than just data volume, smart big data strategies also consider the velocity, variety, and complexity of information." Data sources aren't just simple transaction processing systems. They come from social media, they include dozens of content types (video, audio, etc.), and they come from every known device on the planet.

No wonder the industry is so fired up about big data. The advances create new opportunities for your company to sell more stuff--and for companies to sell more to you. It also means new opportunities to screw up.

So what breaks when you cross the tipping point of big data? You first find that all the fundamentals break. For instance, you can't process all the data any longer, so you start to process only sub-groups, and then you hope the groups you chose are fair representations of the overall data pool (they aren't). You're using traditional structured database systems that no longer work because your datasets are 1,000 times bigger then the DBMS was ever designed to support. You can't inject your data into your analytics (or any other) system fast enough. You can't grow your storage infrastructure fast enough. You can't back up the data fast enough, so the concept of recovery is completely shot.

So what do you do? You stop trying to make last year's solution fit next year's problems.

Global CIO
Global CIOs: A Site Just For You
Visit InformationWeek's Global CIO -- our online community and information resource for CIOs operating in the global economy.

Tons of technologies are being developed to address these issues across the board. Most are simply Band-Aids. Others, like Hadoop, are more radical and will fundamentally change the way you do things (storage, in this case). Most need more time to develop into legitimate enterprise alternatives, but they're on the way.

Meanwhile, the next time someone asks, "What's your plan for big data?" respond, "Which one?" You're going to need a few.

Steve Duplessie is the founder and senior analyst at the Enterprise Strategy Group, a leading independent authority on enterprise storage, analytics, and a range of other business technology interests.

It's time to get going on data center automation. The cloud requires automation, and it'll free resources for other priorities. Download InformationWeek's Data Center Automation special supplement now. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Trendwise
50%
50%
Trendwise,
User Rank: Apprentice
1/7/2012 | 1:39:14 PM
re: Big Data Plans: You Need More Than One
Nice article Steve. 2012 is going to be a big year for Big Data as it comes out of R&D mode to production mode.
-Trendwise Analytics
Doug Laney
50%
50%
Doug Laney,
User Rank: Apprentice
1/6/2012 | 2:51:39 PM
re: Big Data Plans: You Need More Than One
Great post Steve. Good to others finally adopting the Volume-Variety-Velocity construct for big data that Gartner (then Meta Group) published over 10 years ago. For future reference, here's a copy of the original piece I wrote first suggesting the 3Vs, entitled, "Three Dimensional Data Challenge: Controlling Volume, Velocity and Variety": https://www.sugarsync.com/pf/D... --Doug Laney, VP Research, Gartner
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 18, 2014
Enterprise social network success starts and ends with integration. Here's how to finally make collaboration click.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
The weekly wrap-up of the top stories from InformationWeek.com this week.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.