Hadoop's Flexibility Wins Over Online Data Provider
RapLeaf replaced relational database workflow with Hadoop and now can make quick database changes.
Scalability, flexibility, and low cost are the praises we hear repeatedly from Hadoop adopters, and Rapleaf is no exception. The company, which provides companies with data about their online customers, chose Hadoop to replace a MySQL relational database processing workflow nearly four years ago, and it's finding advantages in being able to quickly add new data types to meet changing business needs.
Rapleaf provides data that companies can add to their own customer data in order to do better online personalization and targeting. Like many data providers, Rapleaf trades in demographic information, such as age, income, gender, education, and marital status, as well as psychographic information, such as hobbies, activities, and interests. It partners with email service providers, such as Constant Contact and Exact Target, to help companies doing digital marketing campaigns.
More Insights
Webcasts
- The Dell Difference: Lessons from Dell’s Own IT Transformation
- Why Bad Guys Write Malware– And What You Can Do About It
White Papers
- Workload Automation: The Key to Managing Windows Server Sprawl
- Workload Automation: The Heart of Enterprise Operations
Reports
More >>Rapleaf gets its data from many sources, but the total amount it processes is modest--tens of terabytes compared with the hundreds of terabytes and petabytes that some Hadoop users process. In addition, where many Hadoop users churn through ever-changing data, such as clickstreams that constantly reveal people's latest online activities, Rapleaf's deployment reprocesses a fairly stable core of information, as the universe of households and Internet users doesn't change dramatically.
What does change are the attributes Rapleaf must provide from its stockpile of data, as marketers seek new ways to target consumers. That's where Hadoop's ability to tap new data sources and mix data types comes in. If Rapleaf used a processing system based on a traditional relational database, it would have to use a predefined schema or data model. A database about people, for instance, would require a table with specific columns for attributes such as age, gender, and income level. If it wanted to add new data containing attributes that weren't originally included such as Twitter handle or Facebook name, IT would face the time-consuming task of adding new columns to the table. The larger the table, the bigger the problem.
"Just adding one column to a large table within a relational database can easily take hours, days, or worse, and that's totally unacceptable," says Jeremy Lizt, Rapleaf's VP of engineering.
Using Hadoop, Rapleaf doesn't need to create a new column; it simply tweaks what it calls its "people profile," and new attributes can be extracted in the next round of data processing, adding new sources as necessary to derive the additional information required. Thanks to the platform's scalability and use of highly distributed MapReduce processing, that processing can happen within minutes.
In the early days of its deployment, the young Hadoop platform that Rapleaf was using didn't have a lot of tools and industry best practices to draw from. Hadoop has since matured, and Lizt says enterprise support provider Cloudera helped Rapleaf deal with bug fixes.
"Hadoop is a lot more stable today than it was when we started, and it's obviously going to continue to evolve because it just makes so much sense for anybody who needs to do large-scale data processing," Lizt says.
Hadoop Spurs Big Data Revolution
Related Reading
| To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy. |
Subscribe to RSSResource Links
Related Webcasts
- The Dell Difference: Lessons from Dell’s Own IT Transformation
- Building a Hyperscale Architecture: How Lessons from eBay, Bing and Web Tech Leaders are Transforming Data Centers at Companies Big and Small
- Thriving in a Multi-Platform World: Integrating Mobile Device Management into Your Overall Security Strategy
- How to Build a Next-Generation Big Data Architecture
- Collaborative DevOps: Bridging the gap between development and operations with automation
This Week's Issue
Free Print Subscription
SubscribeCurrent Healthcare Issue
- InformationWeek Healthcare CIO 25: Our second annual honor roll of the health IT leaders driving healthcare's transformation.
- EHR Unreadiness: Only a small percentage of physicians planning to apply for Meaningful Use funds have e-health record systems capable of achieving most of the requirements. .
- And much more!
- Read the Current Issue
Related Whitepapers
- Workload Automation: The Key to Managing Windows Server Sprawl
- Workload Automation: The Heart of Enterprise Operations
- Enterprise Scheduling ROI
- Webinar with Forrester: Mobility and the Open Web: Open Standards and Collaboration Redefine Enterprise IT
- Extending the value of legacy applications through application transformation
Featured Whitepaper
This paper from AccuRev explores the top 5 process development challenges that software development teams face today and focuses on a series of best practices and techniques for development teams looking to improve their software development process.
Learn More












