MapReduce, Hadoop: Young, But Worth A Look - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
News
8/5/2010
02:35 PM
Connect Directly
LinkedIn
Twitter
RSS
E-Mail
50%
50%

MapReduce, Hadoop: Young, But Worth A Look

Businesses should consider it for data management jobs that don't work well in relational databases.

When some IT pros encounter Big Data, they think of big-name IT vendors. Others think of Google. They reckon a company that does a fantastic job searching the Web must know something about managing lots of data.

It does. Google issued a white paper in 2004 on MapReduce, its programming model for processing big data sets, and Google File System that has inspired a new approach to big data computing. Among the first champions were developers, mostly from Yahoo, who came up with Hadoop.

Now an Apache open source framework, Hadoop includes the Hadoop Distributed File System and a MapReduce engine. Think of MapReduce and Hadoop as alternatives for distributed big data processing that may deliver speed, cost, and flexibility advantages over just using massively parallel processing or column-store database options.

Barnes & Noble chose vendor Aster Data in part because it supports in-database MapReduce, which the bookseller thinks will help its data warehouse scale out and perform better. MapReduce lets researchers see trends more quickly than by only using massively parallel processing, says Marc Parrish, Barnes & Noble's VP of retention and loyalty marketing. With the old system, for example, a report on e-book downloads was getting delivered later and later in the day as e-book sales took off last year and the system was choking on the data. "When you're putting database table joins on joins on joins, it's much more efficient to move that query into a MapReduce environment," Parrish says.

Security software maker McAfee is using Hadoop in part because it can handle functions that just don't work well in relational databases. Text analysis, for example, may involve sparse data in which not all columns appear consistently. McAfee also used Hadoop for some high-scale enterprise data warehouse advantages when it consolidated data warehouses. McAfee previously had data warehouses for each type of threat it studied--spam, malware, firewall attacks. Bringing that data together lets McAfee see correlations and explicit connections between different types of threats and perpetrators, says Sven Krasser, McAfee's senior director of data mining research.

Not Easy To Use

The downside of MapReduce and Hadoop (and many emerging NoSQL platforms) is that they're immature, especially compared with SQL, which is now pushing 40 years old. The tools and interfaces are very version 1.0--at best. McAfee is using Datameer's tool for Hadoop search and is testing its tool for spreadsheet-style reporting and trend analysis, and both are in beta.

Another drawback: Most data warehousing and analytics professionals aren't used to their development environments--like Java, Python, and Perl--and may lack the technical depth needed.

Digital marketing firm Adknowledge turned to Hadoop several years ago when its first-generation Netezza deployment reached its scalability limits. The company, which uses predictive analytics to optimize online marketing, built an on-premises Hadoop deployment and later tapped Hadoop instances in the cloud, on Amazon EC2.

To consolidate, Adnowledge completed a 100-TB Greenplum data warehouse deployment in February. It chose Greenplum in part because it integrates with Hadoop, but now it's curtailing Hadoop use. It can give data access to a broader group of people in Greenplum. In Hadoop, people "may have to write code to process the data," says Matt Hoggatt, Adknowledge's director of software development.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
Think Like a Chief Innovation Officer and Get Work Done
Joao-Pierre S. Ruth, Senior Writer,  10/13/2020
Slideshows
10 Trends Accelerating Edge Computing
Cynthia Harvey, Freelance Journalist, InformationWeek,  10/8/2020
News
Northwestern Mutual CIO: Riding Out the Pandemic
Jessica Davis, Senior Editor, Enterprise Apps,  10/7/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
[Special Report] Edge Computing: An IT Platform for the New Enterprise
Edge computing is poised to make a major splash within the next generation of corporate IT architectures. Here's what you need to know!
Slideshows
Flash Poll