Facebook On Big Data Analytics: An Insider's View - InformationWeek
Data Management // Big Data Analytics
03:19 PM
Connect Directly
[Dark Reading Crash Course] Finding & Fixing Application Security Vulnerabilitie
Sep 14, 2017
Hear from a top applications security expert as he discusses key practices for scanning and securi ...Read More>>

Facebook On Big Data Analytics: An Insider's View

Facebook's Jay Parikh talks about fixing Hive, real-time platforms and how traditional companies can 'thread the needle' of big data success.

Jay Parikh, Facebook
Jay Parikh, Facebook
Few businesses are on the scale of Facebook, but the problems it's dealing with today might influence the best practices smaller companies will be putting in place tomorrow.

Just as Facebook is shaping big data hardware and data centers through its Open Compute Project initiative, it's also influencing the software tools and platforms for big data analysis, including Hadoop, Hive, graph analysis and more. Hive, Hadoop's data warehousing infrastructure, originated at Facebook, and according to Jay Parikh, VP of infrastructure engineering, the company is hard at work on ways to make Hive work faster and support more SQL query capabilities.

Parikh also tells InformationWeek that Facebook is working on new real-time and graph-analysis platforms, but the heart and soul of this interview is about big data analytics. There's plenty of detail on how Facebook answers operational and business questions, but read on to get Parikh's advice on how to avoid "wasting a lot of money" or "missing huge opportunities" in big data.

InformationWeek: The topic at hand is big data analytics, but let's start by exploring Facebook's infrastructure to get some context.

Jay Parikh: There are a few areas that we invest in to scale massive amounts of data. If you consider just the photos on Facebook, we have more than 250 billion photos on the site and we get 350 million new photos every day. It's a core, immersive experience for our users, so we've had to rethink and innovate at all levels of the stack, not just the software, to manage these files and to serve them, store them and make sure that they're available when users go back through their timeline to view them. That has meant changes at the hardware level, the network level and the data center level. It's a custom stack, and it doesn't involve Hadoop or Hive or any open source big data platforms.

Another area where we invest is in storing user actions. When you "like" something, post a status update or make a friend on Facebook, we use a very distributed, highly optimized, highly customized version of MySQL to store that data. We run the site, basically, storing all of our user action data in MySQL. That's the second pillar.

[ Want more insider info on Facebook? Read Facebook's Data Center: Where Likes Live. ]

The third area is Hadoop infrastructure. We do a lot with Hadoop. It's used in every product and in many different ways. A few years ago we launched a new version of Facebook Messaging, for example, and it runs on top of HBase [the Hadoop NoSQL database framework]. All of the messages you send on mobile and desktop get persisted to HBase. We relied on our expertise in Hadoop and HDFS to scale HBase to store messages.

We also use a version of Hadoop and Hive to run the business, including a lot of our analytics around optimizing our products, generating reports for our third-party developers, who need to know how their applications are running on the site, and generating reports for advertisers, who need to know how their campaigns are doing. All of those analytics are driven off of Hadoop, HDFS, Hive and interfaces that we've developed for developers, internal data scientists, product managers and external advertisers.

IW: Any big changes afoot, particularly where analytic capabilities are concerned?

Parikh: There's lots of hype in the [IT] industry today about everything needing to be real time. That has been true for us for a long time. We push the front-end website code twice a day. We have thousands of different versions of the site running at any given moment. We launched Light Stand, a new version of our newsfeed, last week, and we launched Facebook Graph Search in January. As people are adopting new products like this, we need to understand whether they're working or not. Are people engaged? Are they missing key features? Are they still liking things as much? If the warehouse or analytics platform can't keep up, then we can't come up with new iterations of our products very quickly. Real-time measurement has been a key element for us.

1 of 3
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
5/6/2013 | 12:55:42 PM
re: Facebook On Big Data Analytics: An Insider's View
I also thought he has developed a huge team in order to maintain big data at larger extent.
User Rank: Apprentice
3/26/2013 | 4:29:29 PM
re: Facebook On Big Data Analytics: An Insider's View
Great article, Doug. Glad you brought the viewpoints of the true pioneers, adopters and practitioner's viewpoints for the benefit of the mainstream enterprise. It was very interesting to read how they push front end code twice a day for analysis and Scuba. Any reason why they did not go with established in-memory databases - a technology which is pretty matured when they adopted MySQL for other purposes?
D. Henschen
D. Henschen,
User Rank: Author
3/18/2013 | 4:55:29 PM
re: Facebook On Big Data Analytics: An Insider's View
Parikh is pretty up front about the limitations of Hive that Facebook is tying to overcome, but he makes it clear it will take a yet-to-be-announced new platform -- expected this summer -- to address real-time analysis needs. Given the many real-time initiatives now underway in the Hadoop community, it will be interesting to see whether Facebook's new platform is embraced the way Hive was embraced way back when.
User Rank: Author
3/18/2013 | 2:25:03 PM
re: Facebook On Big Data Analytics: An Insider's View
Sounds like he has developed a team with a large amount of Hadoop expertise. I wonder if they are hiring up a storm from outside, or grooming people who were already there.

Laurianne McLaughlin
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll