Big Data. Big Decisions
InformationWeek
Special Coverage Series

Commentary


Big Data Plans: You Need More Than One

Stop trying to make last year's solution fit next year's problems. Develop multiple plans for your big data.

Rarely does one catch term ignite an entire market, but in the world of IT, Big Data is it. But big data has a thousand definitions, rendering the term effectively meaningless, so allow me to bring the hype back to earth.

Simply put, big data applies to any dataset that breaks the boundaries and conventional capabilities of IT. Big data's defining characteristic could be scale--capacity is the easiest thing to get your brain around. Sheer volume of content can blow up your data center's existing capabilities. It could be the amount of transactions you need to do.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Big data is really a cause. A new approach to dealing with it is the effect, which is what's important. The effect will change everything.

History And Confusion

Big data is often equated to analytics, and while analytics is one use case, it's by no means the only one. However, it's a good place to start to understand how we got here. In short, we start with the concept of "My Data"--the data from a person, for example.

My Enterprise Strategy Group colleague Julie Lockner created a Structured Data Reference Model that tracks the life of My Data, which makes it easier to understand how something small ends up so very large. In this model, data that's created lives within a transaction processing system. While this model may vary from organization to organization and application to application, generally speaking, four data lifecycles are initiated when data is created: transaction processing, reporting and analytics, backup or disaster recovery, and application testing and development.

Data, created once, is replicated to these four functions, just within the domain of the transaction processing system. The first level of analytics exists within the transaction processing system itself (completed transactions, failures, etc.). The data is then prepared, processed, transformed, and replicated outside of the transaction processing system to be housed inside a data warehousing system, where one may perform analytics on a group of My Data records, looking for sales based on geographies, for example. That data warehouse also will require data protection and disaster recovery functions, and other copies will be required for test/development.

Then, all the My Data objects are transformed, processed, and replicated to a "Big Analytics" system, where it's pored over for shopping cart dropout rates and other cause/effect scenarios. Again, copies of copies are used for test/development, backup, and DR.

Wow. It doesn't take long to see how one little transaction record can grow 100-fold. Sooner or later, that growth will break the capabilities of conventional IT.

To steal a line from Julie: "More than just data volume, smart big data strategies also consider the velocity, variety, and complexity of information." Data sources aren't just simple transaction processing systems. They come from social media, they include dozens of content types (video, audio, etc.), and they come from every known device on the planet.

No wonder the industry is so fired up about big data. The advances create new opportunities for your company to sell more stuff--and for companies to sell more to you. It also means new opportunities to screw up.

So what breaks when you cross the tipping point of big data? You first find that all the fundamentals break. For instance, you can't process all the data any longer, so you start to process only sub-groups, and then you hope the groups you chose are fair representations of the overall data pool (they aren't). You're using traditional structured database systems that no longer work because your datasets are 1,000 times bigger then the DBMS was ever designed to support. You can't inject your data into your analytics (or any other) system fast enough. You can't grow your storage infrastructure fast enough. You can't back up the data fast enough, so the concept of recovery is completely shot.

So what do you do? You stop trying to make last year's solution fit next year's problems.

Global CIO
Global CIOs: A Site Just For You
Visit InformationWeek's Global CIO -- our online community and information resource for CIOs operating in the global economy.

Tons of technologies are being developed to address these issues across the board. Most are simply Band-Aids. Others, like Hadoop, are more radical and will fundamentally change the way you do things (storage, in this case). Most need more time to develop into legitimate enterprise alternatives, but they're on the way.

Meanwhile, the next time someone asks, "What's your plan for big data?" respond, "Which one?" You're going to need a few.

Steve Duplessie is the founder and senior analyst at the Enterprise Strategy Group, a leading independent authority on enterprise storage, analytics, and a range of other business technology interests.

It's time to get going on data center automation. The cloud requires automation, and it'll free resources for other priorities. Download InformationWeek's Data Center Automation special supplement now. (Free registration required.)



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.