Facebook Exec: Databases, Hadoop Belong Together - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
11:37 AM
Doug Henschen
Doug Henschen
Connect Directly

Facebook Exec: Databases, Hadoop Belong Together

Facebook's Ken Rudin says relational databases and Hadoop are complementary. Hortonworks and Teradata execs add guidelines on when to choose which.

It's not either-or, it's both. This was a big theme at this week's Strata/Hadoop World conference in New York as executives from Facebook, Hortonworks and Teradata all discussed how relational database management systems (RDBMS) and Hadoop fit together.

Facebook's Ken Rudin, head of analytics, knows the needs of ordinary enterprises all too well, having worked at both Salesforce.com and Siebel before founding SaaS-based BI vendor LucidEra. (After LucidEra was sold, Rudin ventured into big data analytics, first at Zynga and now at Facebook.) So it wasn't shocking to hear Rudin extol the virtues of relational databases, even though Facebook is a big Hadoop user.

"We keep the low-level, most granular, detailed data in Hadoop at Facebook, but move the transformed and aggregated data into relational databases because slicing and dicing is faster and easier," said Rudin said during a keynote presentation.

In a follow-up interview, Rudin told me that Hadoop is essentially the discovery platform where data scientist types have the flexibility to explore data without the constraints of a predefined data model. There they can make novel discoveries about trends, behavior patterns and relationships.

[ Want more on Strata announcements? Read Cloudera Plans Data Hub Role For Hadoop. ]

"We'll find out what's important through ad hoc analysis on Hadoop, and once that's known, we'll aggregate the data across the dimensions that we need and put that into a relational environment," Rudin explained. "With a relational system I can get answers in seconds instead of tens of minutes."

There are some analyses that have to stay in the big data realm of Hadoop, including graph analyses and optimizations involving complicated computations that just don't take to SQL. Facebook has its own graph analysis technology for finding relationships among people at scale, and this is the stuff that really made Facebook tick as a social network.

In an example of optimization, Facebook has to figure out which items, out of thousands of possibilities, to put in user news feeds. It's not unlike the advertising optimization work that Internet giants routinely handled in Hadoop.

"It's not a metrics and dimensions problem, it's a long, linear equation, and we need to process it for one million people at a time," Rudin explained.

So when do you choose Hadoop and when do you choose relational? That's a topic Stephen Brobst, CTO at Teradata, and Ari Zilka, CTO at Hortonworks, took up in a discussion of the best uses of relational databases and Hadoop. To my great surprise, Hortonworks exec Zilka made the case for relational databases while Brobst made the case for Hadoop.

Zilka was a cofounder of the Terracotta in-memory database, so he's no stranger to the relational world, but hearing a relational database vendor exec like Brobst make a strong case for Hadoop was refreshing.

"The kinds of problems you're trying to solve [with Hadoop] are not about generating a report," Brobst explained. "Hadoop is for much more sophisticated uses like analyzing text on Web pages or analyzing relationships, and you use techniques like machine learning, scoring and building search indexes that solve very different problems."

Relational databases have effectively joined the big data world, Zilka argued, by way of massively parallel processing. MPP is the architecture behind relational products including Teradata, Pivotal Greenplum, IBM PureEdge for Analytics (formerly Netezza), Actian's ParAccel, HP Vertica, Microsoft SQL Server PDW and others.

"There's nothing about relational that is too old or too stodgy or too small to handle the data volume of even the largest transactional data sets," Zilka argued.

Still need help understanding which platform to choose? Zilka and Brobst ended with a nice list of attributes to consider:

-- Stable schema = RDBMS; evolving schema = Hadoop
-- Structured data = RDBMS; variably structured data = Hadoop
-- ANSI SQL = RDBMS ; flexible programming = Hadoop
-- Cleaned data = RDBMS; raw data = Hadoop
-- Updates/deletes = RDBMS; ingest = Hadoop
-- Core data = RDBMS; all data = Hadoop
-- Complex joins = RDBMS; complex processing = Hadoop
-- Efficient use of CPU/IO = RDBMS; low-cost storage = Hadoop.

Rudin, Zilka and Brobst all supported the notion that Hadoop is more closely aligned with what you might call bigger, more exploratory questions. You don't even attempt to bring uniform structured to the data, as you would with a relational database, because you don't even know what questions you want to ask yet.

"In the big data world, all data has value, you just haven't found it yet," Brobst explained. "If you use a different economic model, leverage the open source characteristics of Hadoop, and leverage commodity storage and servers, you can store multiple orders of magnitude more data. That data lake allows us to explore the data and discover where that value is."

So there you have it: Hadoop and RDBMS are destined to live together. That may not be peace and harmony in the early stages, as teams compete for budgets and workloads. But if the interests of the business are to be best served, live together they will.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
11/1/2013 | 2:11:04 PM
re: Facebook Exec: Databases, Hadoop Belong Together
A lot of relational database fans are Tweeting with satisfaction that this article offers evidence that Hadoop will not replace RDBMS. I don't think knowledgeable Hadoop fans ever made that claim. Cloudera's "center is shifting" argument, for example, never asserted that RDBMSs would go away. That company's latest "Enterprise Data Hub" spin (which echoes the Sears/Metascale vision) sees Hadoop handling all the raw data at high scale. RDBMS becomes a "specialized" warehouse/mart environment for fast analysis of refined, structured data. In other words, think marts and focused operational data warehouses.

The one RDBMS concept that goes away if the Enterprise Data Hub vision takes hold is the all-encompassing enterprise data warehouse (EDW). EDWs mostly fell short of that "enterprisewide" vision, despite costly and time-consuming effort. Keeping up with variable, ever-changing data is something that the RDBMS just doesn't do well. And trying to do it at high scale with an RDBMS is an expensive proposition.
David F. Carr
David F. Carr,
User Rank: Author
10/31/2013 | 4:14:45 PM
re: Facebook Exec: Databases, Hadoop Belong Together
Love the summary of the trade offs. I don't know that I've seen that expressed this clearly anywhere else.
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

How SolarWinds Changed Cybersecurity Leadership's Priorities
Jessica Davis, Senior Editor, Enterprise Apps,  5/26/2021
How CIOs Can Advance Company Sustainability Goals
Lisa Morgan, Freelance Writer,  5/26/2021
IT Skills: Top 10 Programming Languages for 2021
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/21/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll