Research: The Big Data Management Challenge

Apr 12, 2012


Big Data's Origins

The volume of information coming into most companies has exploded in recent years, and many IT shops are dealing with extremely large data sets. If you don't have the right tools and architectures to deal with all that information, big data can be a big problem, according to our InformationWeek 2012 Big Data Survey of 231 business technology professionals.

Unique characteristics of some industries, such as finance, genomics and health science research, are driving an increased volume of data. But our survey found that the top five big data drivers are financial transactions, email, imaging data, Web logs, and Internet text and documents--all data sources common to every industry. It's clear that if big data isn’t a challenge for you now, it will be very soon.

The big data challenge is real, but only a third of the businesses we surveyed differentiate "big data" from traditional data, and use distinct tools and management approaches to deal with the higher volume, complexity and dynamics of big data processing. And nearly 90% of respondents are still using conventional databases as the primary means of handling data.

Any business creating large data sets must embrace big data management. Without the right tools and architectures, a company won’t be able to effectively use the information it has collected. The two main benefits of big data management tools are the ability to standardize procedures and services, and to organize data in ways that it can be searched, browsed, navigated and analyzed, our survey respondents say.
In this report, we take a look at what constitutes big data. Turns out it's more than just size. You also have to look at the type of data involved--structured, unstructured or semistructured--as well as latency and complexity. Big data sets have their own unique challenges. They're more difficult to search, store, share and analyze.

Businesses tend to be divided between those that need near-real-time processing of their big data and those that don't. We'll take a look at the management options available, including stream processing, which offers real-time distributed processing. We'll also look at batch processing, which uses open source Apache Hadoop MapReduce to write applications that rapidly process large amounts of data in parallel on clusters of compute nodes.

It's complicated and expensive to use a traditional environment to store and process petabytes or more of data. Hadoop environments are just as complex, but they do offer some cost savings. We'll take a look the economics behind big data processing and how cloud computing could play a role in keeping costs in line. We'll also look at the various products available, including converged offerings, to help you meet the big data management challenge. (R4030212)

Survey Name   InformationWeek 2012 Big Data Survey
Survey Date   December 2011
Region   North America
Number of Respondents    231 at organizations with 10 TB or more of data
Purpose   To examine the state of big data in the enterprise and the methods by which organizations are managing big data

Research Report