Facebook Exec: Databases, Hadoop Belong Together - InformationWeek
Software // Information Management
11:37 AM
Doug Henschen
Doug Henschen
Connect Directly
WANTED: Data Analysts, but are there enough?
Mar 08, 2017
As the internet and availability of data has disrupted many industries, one career field that cont ...Read More>>

Facebook Exec: Databases, Hadoop Belong Together

Facebook's Ken Rudin says relational databases and Hadoop are complementary. Hortonworks and Teradata execs add guidelines on when to choose which.

It's not either-or, it's both. This was a big theme at this week's Strata/Hadoop World conference in New York as executives from Facebook, Hortonworks and Teradata all discussed how relational database management systems (RDBMS) and Hadoop fit together.

Facebook's Ken Rudin, head of analytics, knows the needs of ordinary enterprises all too well, having worked at both Salesforce.com and Siebel before founding SaaS-based BI vendor LucidEra. (After LucidEra was sold, Rudin ventured into big data analytics, first at Zynga and now at Facebook.) So it wasn't shocking to hear Rudin extol the virtues of relational databases, even though Facebook is a big Hadoop user.

"We keep the low-level, most granular, detailed data in Hadoop at Facebook, but move the transformed and aggregated data into relational databases because slicing and dicing is faster and easier," said Rudin said during a keynote presentation.

In a follow-up interview, Rudin told me that Hadoop is essentially the discovery platform where data scientist types have the flexibility to explore data without the constraints of a predefined data model. There they can make novel discoveries about trends, behavior patterns and relationships.

[ Want more on Strata announcements? Read Cloudera Plans Data Hub Role For Hadoop. ]

"We'll find out what's important through ad hoc analysis on Hadoop, and once that's known, we'll aggregate the data across the dimensions that we need and put that into a relational environment," Rudin explained. "With a relational system I can get answers in seconds instead of tens of minutes."

There are some analyses that have to stay in the big data realm of Hadoop, including graph analyses and optimizations involving complicated computations that just don't take to SQL. Facebook has its own graph analysis technology for finding relationships among people at scale, and this is the stuff that really made Facebook tick as a social network.

In an example of optimization, Facebook has to figure out which items, out of thousands of possibilities, to put in user news feeds. It's not unlike the advertising optimization work that Internet giants routinely handled in Hadoop.

"It's not a metrics and dimensions problem, it's a long, linear equation, and we need to process it for one million people at a time," Rudin explained.

So when do you choose Hadoop and when do you choose relational? That's a topic Stephen Brobst, CTO at Teradata, and Ari Zilka, CTO at Hortonworks, took up in a discussion of the best uses of relational databases and Hadoop. To my great surprise, Hortonworks exec Zilka made the case for relational databases while Brobst made the case for Hadoop.

Zilka was a cofounder of the Terracotta in-memory database, so he's no stranger to the relational world, but hearing a relational database vendor exec like Brobst make a strong case for Hadoop was refreshing.

"The kinds of problems you're trying to solve [with Hadoop] are not about generating a report," Brobst explained. "Hadoop is for much more sophisticated uses like analyzing text on Web pages or analyzing relationships, and you use techniques like machine learning, scoring and building search indexes that solve very different problems."

Relational databases have effectively joined the big data world, Zilka argued, by way of massively parallel processing. MPP is the architecture behind relational products including Teradata, Pivotal Greenplum, IBM PureEdge for Analytics (formerly Netezza), Actian's ParAccel, HP Vertica, Microsoft SQL Server PDW and others.

"There's nothing about relational that is too old or too stodgy or too small to handle the data volume of even the largest transactional data sets," Zilka argued.

Still need help understanding which platform to choose? Zilka and Brobst ended with a nice list of attributes to consider:

-- Stable schema = RDBMS; evolving schema = Hadoop
-- Structured data = RDBMS; variably structured data = Hadoop
-- ANSI SQL = RDBMS ; flexible programming = Hadoop
-- Cleaned data = RDBMS; raw data = Hadoop
-- Updates/deletes = RDBMS; ingest = Hadoop
-- Core data = RDBMS; all data = Hadoop
-- Complex joins = RDBMS; complex processing = Hadoop
-- Efficient use of CPU/IO = RDBMS; low-cost storage = Hadoop.

Rudin, Zilka and Brobst all supported the notion that Hadoop is more closely aligned with what you might call bigger, more exploratory questions. You don't even attempt to bring uniform structured to the data, as you would with a relational database, because you don't even know what questions you want to ask yet.

"In the big data world, all data has value, you just haven't found it yet," Brobst explained. "If you use a different economic model, leverage the open source characteristics of Hadoop, and leverage commodity storage and servers, you can store multiple orders of magnitude more data. That data lake allows us to explore the data and discover where that value is."

So there you have it: Hadoop and RDBMS are destined to live together. That may not be peace and harmony in the early stages, as teams compete for budgets and workloads. But if the interests of the business are to be best served, live together they will.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
11/1/2013 | 2:11:04 PM
re: Facebook Exec: Databases, Hadoop Belong Together
A lot of relational database fans are Tweeting with satisfaction that this article offers evidence that Hadoop will not replace RDBMS. I don't think knowledgeable Hadoop fans ever made that claim. Cloudera's "center is shifting" argument, for example, never asserted that RDBMSs would go away. That company's latest "Enterprise Data Hub" spin (which echoes the Sears/Metascale vision) sees Hadoop handling all the raw data at high scale. RDBMS becomes a "specialized" warehouse/mart environment for fast analysis of refined, structured data. In other words, think marts and focused operational data warehouses.

The one RDBMS concept that goes away if the Enterprise Data Hub vision takes hold is the all-encompassing enterprise data warehouse (EDW). EDWs mostly fell short of that "enterprisewide" vision, despite costly and time-consuming effort. Keeping up with variable, ever-changing data is something that the RDBMS just doesn't do well. And trying to do it at high scale with an RDBMS is an expensive proposition.
David F. Carr
David F. Carr,
User Rank: Author
10/31/2013 | 4:14:45 PM
re: Facebook Exec: Databases, Hadoop Belong Together
Love the summary of the trade offs. I don't know that I've seen that expressed this clearly anywhere else.
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Success = Storage & Data Center Performance
Balancing legacy infrastructure with emerging technologies requires laying a solid foundation that delivers flexibility, scalability, and efficiency. Learn what the most pressing issues are, how to incorporate advances like software-defined storage, and strategies for streamlining the data center.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 6, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll