Software // Information Management
01:45 PM
Connect Directly

Hadoop's Flexibility Wins Over Online Data Provider

RapLeaf replaced relational database workflow with Hadoop and now can make quick database changes.

Scalability, flexibility, and low cost are the praises we hear repeatedly from Hadoop adopters, and Rapleaf is no exception. The company, which provides companies with data about their online customers, chose Hadoop to replace a MySQL relational database processing workflow nearly four years ago, and it's finding advantages in being able to quickly add new data types to meet changing business needs.

Rapleaf provides data that companies can add to their own customer data in order to do better online personalization and targeting. Like many data providers, Rapleaf trades in demographic information, such as age, income, gender, education, and marital status, as well as psychographic information, such as hobbies, activities, and interests. It partners with email service providers, such as Constant Contact and Exact Target, to help companies doing digital marketing campaigns.

Rapleaf gets its data from many sources, but the total amount it processes is modest--tens of terabytes compared with the hundreds of terabytes and petabytes that some Hadoop users process. In addition, where many Hadoop users churn through ever-changing data, such as clickstreams that constantly reveal people's latest online activities, Rapleaf's deployment reprocesses a fairly stable core of information, as the universe of households and Internet users doesn't change dramatically.

What does change are the attributes Rapleaf must provide from its stockpile of data, as marketers seek new ways to target consumers. That's where Hadoop's ability to tap new data sources and mix data types comes in. If Rapleaf used a processing system based on a traditional relational database, it would have to use a predefined schema or data model. A database about people, for instance, would require a table with specific columns for attributes such as age, gender, and income level. If it wanted to add new data containing attributes that weren't originally included such as Twitter handle or Facebook name, IT would face the time-consuming task of adding new columns to the table. The larger the table, the bigger the problem.

"Just adding one column to a large table within a relational database can easily take hours, days, or worse, and that's totally unacceptable," says Jeremy Lizt, Rapleaf's VP of engineering.

Using Hadoop, Rapleaf doesn't need to create a new column; it simply tweaks what it calls its "people profile," and new attributes can be extracted in the next round of data processing, adding new sources as necessary to derive the additional information required. Thanks to the platform's scalability and use of highly distributed MapReduce processing, that processing can happen within minutes.

In the early days of its deployment, the young Hadoop platform that Rapleaf was using didn't have a lot of tools and industry best practices to draw from. Hadoop has since matured, and Lizt says enterprise support provider Cloudera helped Rapleaf deal with bug fixes.

"Hadoop is a lot more stable today than it was when we started, and it's obviously going to continue to evolve because it just makes so much sense for anybody who needs to do large-scale data processing," Lizt says.

Read the mainbar:
Hadoop Spurs Big Data Revolution

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
The Agile Archive
The Agile Archive
When it comes to managing data, don’t look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of July 17, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.