Software // Information Management
11:20 AM
Connect Directly

Hadoop Spurs Big Data Revolution

Open source data processing platform has won over Web giants for its low cost, scalability, and flexibility. Now Hadoop will make its way into more enterprises.

R&D Roots At AOL

AOL has been using Hadoop for more than three years, first in its R&D unit, to make sense of the navigation patterns of the more than 180 million unique site visitors per month across, MapQuest, the Huffington Post, and dozens of other sites it owns.

AOL starts by gathering as much information as possible about visitors' activities. That's where Hadoop's low-cost and scalability come in. "When you do the math, the cost per node of commodity systems versus commercial systems makes the choice very obvious," says Bao Nguyen, AOL's technical director of R&D for large-scale analytics. "The cost per node is orders of magnitude higher for the commercial systems."

AOL's R&D unit has a 300-node Hadoop deployment of mixed vintage and capacity in Mountain View, Calif. That system can store more than 500 TB of clickstream data on billions of events per day. An event can be someone clicking on an email promotion or banner ad, doing a search, reading an article, visiting a site, or clicking on a particular product on an e-commerce page. Events can also include time stamps added to the history and profile of a particular visitor (known by a particular cookie ID number but not by personally identifiable information).

This clickstream data is highly structured, but it's so massive and varied that it would be next to impossible to handle all the extract, transform, and load work that would be required to move it into a conventional relational database. AOL uses Hadoop's MapReduce processes to filter and correlate data, distributing text extraction, correlation, and calculation steps across hundreds of compute nodes.

With MapReduce job after MapReduce job, AOL refines massive amounts of raw data into thousands of categories, such as automobiles, news, finance, and sports. Next, it identifies features and attributes of the visitors to each category, determining whether they're car buyers, mortgage prospects, male heads of household, or teenagers, for example.

It feeds the final refined feature sets into more proprietary analytic applications (many built out on conventional relational platforms) that get down to the business priorities of delivering the right ad banners and email campaigns to the right people at the right time.

When online behavior shows that a visitor is interested in cars, Hadoop helps AOL figure that out and deliver a relevant ad. Hadoop is a batch-oriented platform, so it might take a day or two for such indicators to emerge. But profiles have a way of building over time and providing rich, multi-attribute targeting possibilities.

The success of the R&D Hadoop deployment led AOL to deploy an even larger, 700-node production system in April at its Dulles, Va., headquarters. The R&D unit now does more exploratory and ad hoc analyses, while the petabyte-scale production deployment does proven analyses, such as routine customer segmentation and online behavioral analysis. For example, an ad-targeting model running on the production deployment correlates data on the online and offline buying behavior of customers of large retailers that have both physical and online stores. AOL uses this anonymized data to build customer profiles and predictive models that let it aim online advertising at its 180 million unique online visitors per month.

3 of 5
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
6/15/2012 | 10:10:09 PM
re: Hadoop Spurs Big Data Revolution
That's per core, but these stats have all been surpassed with the latest hardware.
User Rank: Apprentice
12/6/2011 | 4:08:09 AM
re: Hadoop Spurs Big Data Revolution
Reading through the whole document I see only one mention of Yahoo, and no mention of Yahoo as the originator of Hadoop. It sometimes appears that the Press is intent on highlighting all of Yahoo's weaknesses, and none of it's strengths. Perhaps you think this information is already well-known, but the pie-chart showing that 74% have "no current or planned use" would suggest otherwise. For those who wish to read more meaty detail, see
User Rank: Apprentice
12/4/2011 | 8:59:29 PM
re: Hadoop Spurs Big Data Revolution
Matspca - we're working on establishing a benchmark for Hadoop. If you'd like to participate, please let me know at
User Rank: Apprentice
11/30/2011 | 11:01:45 PM
re: Hadoop Spurs Big Data Revolution
Not everyone believes in the Hype of Hadoop. See The big organizations mentioned here can afford to use non optimal solutions. I have seen no benchmark showing Hadoop beating say Oracle. My own noSQL database beats Hadoop by a large margin using $330 PC verses $1 million (or so) used by Hadoop for the same benchmark. See

I will continue following the Hype of Hadoop and if there really is some substance behind it then I look forward to a .NET version of the distribution mechanism.
User Rank: Apprentice
11/10/2011 | 8:48:32 PM
re: Hadoop Spurs Big Data Revolution
128 MB of RAM for 16 cores? That has to be typo.
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest August 03, 2015
The networking industry agrees that software-defined networking is the way of the future. So where are all the deployments? We take a look at where SDN is being deployed and what's getting in the way of deployments.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of July 26, 2015.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.