Software // Information Management
News
8/5/2010
02:35 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

MapReduce, Hadoop: Young, But Worth A Look

Businesses should consider it for data management jobs that don't work well in relational databases.

When some IT pros encounter Big Data, they think of big-name IT vendors. Others think of Google. They reckon a company that does a fantastic job searching the Web must know something about managing lots of data.

It does. Google issued a white paper in 2004 on MapReduce, its programming model for processing big data sets, and Google File System that has inspired a new approach to big data computing. Among the first champions were developers, mostly from Yahoo, who came up with Hadoop.

Now an Apache open source framework, Hadoop includes the Hadoop Distributed File System and a MapReduce engine. Think of MapReduce and Hadoop as alternatives for distributed big data processing that may deliver speed, cost, and flexibility advantages over just using massively parallel processing or column-store database options.

Barnes & Noble chose vendor Aster Data in part because it supports in-database MapReduce, which the bookseller thinks will help its data warehouse scale out and perform better. MapReduce lets researchers see trends more quickly than by only using massively parallel processing, says Marc Parrish, Barnes & Noble's VP of retention and loyalty marketing. With the old system, for example, a report on e-book downloads was getting delivered later and later in the day as e-book sales took off last year and the system was choking on the data. "When you're putting database table joins on joins on joins, it's much more efficient to move that query into a MapReduce environment," Parrish says.

Security software maker McAfee is using Hadoop in part because it can handle functions that just don't work well in relational databases. Text analysis, for example, may involve sparse data in which not all columns appear consistently. McAfee also used Hadoop for some high-scale enterprise data warehouse advantages when it consolidated data warehouses. McAfee previously had data warehouses for each type of threat it studied--spam, malware, firewall attacks. Bringing that data together lets McAfee see correlations and explicit connections between different types of threats and perpetrators, says Sven Krasser, McAfee's senior director of data mining research.

Not Easy To Use

The downside of MapReduce and Hadoop (and many emerging NoSQL platforms) is that they're immature, especially compared with SQL, which is now pushing 40 years old. The tools and interfaces are very version 1.0--at best. McAfee is using Datameer's tool for Hadoop search and is testing its tool for spreadsheet-style reporting and trend analysis, and both are in beta.

Another drawback: Most data warehousing and analytics professionals aren't used to their development environments--like Java, Python, and Perl--and may lack the technical depth needed.

Digital marketing firm Adknowledge turned to Hadoop several years ago when its first-generation Netezza deployment reached its scalability limits. The company, which uses predictive analytics to optimize online marketing, built an on-premises Hadoop deployment and later tapped Hadoop instances in the cloud, on Amazon EC2.

To consolidate, Adnowledge completed a 100-TB Greenplum data warehouse deployment in February. It chose Greenplum in part because it integrates with Hadoop, but now it's curtailing Hadoop use. It can give data access to a broader group of people in Greenplum. In Hadoop, people "may have to write code to process the data," says Matt Hoggatt, Adknowledge's director of software development.

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.