Big Data. Big Decisions
InformationWeek
Special Coverage Series


MapReduce, Hadoop: Young, But Worth A Look

Businesses should consider it for data management jobs that don't work well in relational databases.

When some IT pros encounter Big Data, they think of big-name IT vendors. Others think of Google. They reckon a company that does a fantastic job searching the Web must know something about managing lots of data.

It does. Google issued a white paper in 2004 on MapReduce, its programming model for processing big data sets, and Google File System that has inspired a new approach to big data computing. Among the first champions were developers, mostly from Yahoo, who came up with Hadoop.

Now an Apache open source framework, Hadoop includes the Hadoop Distributed File System and a MapReduce engine. Think of MapReduce and Hadoop as alternatives for distributed big data processing that may deliver speed, cost, and flexibility advantages over just using massively parallel processing or column-store database options.

Barnes & Noble chose vendor Aster Data in part because it supports in-database MapReduce, which the bookseller thinks will help its data warehouse scale out and perform better. MapReduce lets researchers see trends more quickly than by only using massively parallel processing, says Marc Parrish, Barnes & Noble's VP of retention and loyalty marketing. With the old system, for example, a report on e-book downloads was getting delivered later and later in the day as e-book sales took off last year and the system was choking on the data. "When you're putting database table joins on joins on joins, it's much more efficient to move that query into a MapReduce environment," Parrish says.

Security software maker McAfee is using Hadoop in part because it can handle functions that just don't work well in relational databases. Text analysis, for example, may involve sparse data in which not all columns appear consistently. McAfee also used Hadoop for some high-scale enterprise data warehouse advantages when it consolidated data warehouses. McAfee previously had data warehouses for each type of threat it studied--spam, malware, firewall attacks. Bringing that data together lets McAfee see correlations and explicit connections between different types of threats and perpetrators, says Sven Krasser, McAfee's senior director of data mining research.

Not Easy To Use

The downside of MapReduce and Hadoop (and many emerging NoSQL platforms) is that they're immature, especially compared with SQL, which is now pushing 40 years old. The tools and interfaces are very version 1.0--at best. McAfee is using Datameer's tool for Hadoop search and is testing its tool for spreadsheet-style reporting and trend analysis, and both are in beta.

Another drawback: Most data warehousing and analytics professionals aren't used to their development environments--like Java, Python, and Perl--and may lack the technical depth needed.

Digital marketing firm Adknowledge turned to Hadoop several years ago when its first-generation Netezza deployment reached its scalability limits. The company, which uses predictive analytics to optimize online marketing, built an on-premises Hadoop deployment and later tapped Hadoop instances in the cloud, on Amazon EC2.

To consolidate, Adnowledge completed a 100-TB Greenplum data warehouse deployment in February. It chose Greenplum in part because it integrates with Hadoop, but now it's curtailing Hadoop use. It can give data access to a broader group of people in Greenplum. In Hadoop, people "may have to write code to process the data," says Matt Hoggatt, Adknowledge's director of software development.



Related Reading


More Insights




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.