Data Management

Hadoop's Flexibility Wins Over Online Data Provider

RapLeaf replaced relational database workflow with Hadoop and now can make quick database changes.

Doug Henschen, Executive Editor, Enterprise Apps

November 10, 2011

3 Min Read

Scalability, flexibility, and low cost are the praises we hear repeatedly from Hadoop adopters, and Rapleaf is no exception. The company, which provides companies with data about their online customers, chose Hadoop to replace a MySQL relational database processing workflow nearly four years ago, and it's finding advantages in being able to quickly add new data types to meet changing business needs.

Rapleaf provides data that companies can add to their own customer data in order to do better online personalization and targeting. Like many data providers, Rapleaf trades in demographic information, such as age, income, gender, education, and marital status, as well as psychographic information, such as hobbies, activities, and interests. It partners with email service providers, such as Constant Contact and Exact Target, to help companies doing digital marketing campaigns.

Rapleaf gets its data from many sources, but the total amount it processes is modest--tens of terabytes compared with the hundreds of terabytes and petabytes that some Hadoop users process. In addition, where many Hadoop users churn through ever-changing data, such as clickstreams that constantly reveal people's latest online activities, Rapleaf's deployment reprocesses a fairly stable core of information, as the universe of households and Internet users doesn't change dramatically.

What does change are the attributes Rapleaf must provide from its stockpile of data, as marketers seek new ways to target consumers. That's where Hadoop's ability to tap new data sources and mix data types comes in. If Rapleaf used a processing system based on a traditional relational database, it would have to use a predefined schema or data model. A database about people, for instance, would require a table with specific columns for attributes such as age, gender, and income level. If it wanted to add new data containing attributes that weren't originally included such as Twitter handle or Facebook name, IT would face the time-consuming task of adding new columns to the table. The larger the table, the bigger the problem.

"Just adding one column to a large table within a relational database can easily take hours, days, or worse, and that's totally unacceptable," says Jeremy Lizt, Rapleaf's VP of engineering.

Using Hadoop, Rapleaf doesn't need to create a new column; it simply tweaks what it calls its "people profile," and new attributes can be extracted in the next round of data processing, adding new sources as necessary to derive the additional information required. Thanks to the platform's scalability and use of highly distributed MapReduce processing, that processing can happen within minutes.

In the early days of its deployment, the young Hadoop platform that Rapleaf was using didn't have a lot of tools and industry best practices to draw from. Hadoop has since matured, and Lizt says enterprise support provider Cloudera helped Rapleaf deal with bug fixes.

"Hadoop is a lot more stable today than it was when we started, and it's obviously going to continue to evolve because it just makes so much sense for anybody who needs to do large-scale data processing," Lizt says.

Read the mainbar:
Hadoop Spurs Big Data Revolution

About the Author

Doug Henschen

Executive Editor, Enterprise Apps

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of Transform Magazine, and Executive Editor at DM News. He has covered IT and data-driven marketing for more than 15 years.

See more from Doug Henschen

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Hadoop's Flexibility Wins Over Online Data Provider

About the Author

Editor's Choice

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

<span class="ArticleBase-LargeTitle">Hadoop's Flexibility Wins Over Online Data Provider</span>Hadoop's Flexibility Wins Over Online Data Provider

About the Author

Editor's Choice

Hadoop's Flexibility Wins Over Online Data Provider