Home
BYTE Newsletter
Keep up with all the BYTE News and Reviews

Subscribe

Facebook On Big Data Analytics: An Insider's View

Comments | Doug Henschen, InformationWeek | March 18, 2013 09:15 AM


Jay Parikh, Facebook
Jay Parikh, Facebook
Few businesses are on the scale of Facebook, but the problems it's dealing with today might influence the best practices smaller companies will be putting in place tomorrow.

Just as Facebook is shaping big data hardware and data centers through its Open Compute Project initiative, it's also influencing the software tools and platforms for big data analysis, including Hadoop, Hive, graph analysis and more. Hive, Hadoop's data warehousing infrastructure, originated at Facebook, and according to Jay Parikh, VP of infrastructure engineering, the company is hard at work on ways to make Hive work faster and support more SQL query capabilities.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Parikh also tells InformationWeek that Facebook is working on new real-time and graph-analysis platforms, but the heart and soul of this interview is about big data analytics. There's plenty of detail on how Facebook answers operational and business questions, but read on to get Parikh's advice on how to avoid "wasting a lot of money" or "missing huge opportunities" in big data.

InformationWeek: The topic at hand is big data analytics, but let's start by exploring Facebook's infrastructure to get some context.

Jay Parikh: There are a few areas that we invest in to scale massive amounts of data. If you consider just the photos on Facebook, we have more than 250 billion photos on the site and we get 350 million new photos every day. It's a core, immersive experience for our users, so we've had to rethink and innovate at all levels of the stack, not just the software, to manage these files and to serve them, store them and make sure that they're available when users go back through their timeline to view them. That has meant changes at the hardware level, the network level and the data center level. It's a custom stack, and it doesn't involve Hadoop or Hive or any open source big data platforms.

Another area where we invest is in storing user actions. When you "like" something, post a status update or make a friend on Facebook, we use a very distributed, highly optimized, highly customized version of MySQL to store that data. We run the site, basically, storing all of our user action data in MySQL. That's the second pillar.

[ Want more insider info on Facebook? Read Facebook's Data Center: Where Likes Live. ]

The third area is Hadoop infrastructure. We do a lot with Hadoop. It's used in every product and in many different ways. A few years ago we launched a new version of Facebook Messaging, for example, and it runs on top of HBase [the Hadoop NoSQL database framework]. All of the messages you send on mobile and desktop get persisted to HBase. We relied on our expertise in Hadoop and HDFS to scale HBase to store messages.

We also use a version of Hadoop and Hive to run the business, including a lot of our analytics around optimizing our products, generating reports for our third-party developers, who need to know how their applications are running on the site, and generating reports for advertisers, who need to know how their campaigns are doing. All of those analytics are driven off of Hadoop, HDFS, Hive and interfaces that we've developed for developers, internal data scientists, product managers and external advertisers.

IW: Any big changes afoot, particularly where analytic capabilities are concerned?

Parikh: There's lots of hype in the [IT] industry today about everything needing to be real time. That has been true for us for a long time. We push the front-end website code twice a day. We have thousands of different versions of the site running at any given moment. We launched Light Stand, a new version of our newsfeed, last week, and we launched Facebook Graph Search in January. As people are adopting new products like this, we need to understand whether they're working or not. Are people engaged? Are they missing key features? Are they still liking things as much? If the warehouse or analytics platform can't keep up, then we can't come up with new iterations of our products very quickly. Real-time measurement has been a key element for us.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

COMMENTS

Tune In to BYTE
Facebook Twitter LinkedIn Newsletter RSS
Whitepapers
whitepaper
In this paper you will learn the five trends shaping the future of enterprise mobility. Learn how the rise of social media as a business application, the lurring between work and home, the emergence of new mobile devices, the demand for tech savvy employees and changing expectations of corporate IT will fundamentally change the workplace.
whitepaper
In a survey of more than 1,700 information workers (iWorkers) in North America, notebooks, desktops, and smartphones were found to be “must-have” devices, while tablets, slates, and netbooks were relegated to “nice-to-have” status, according to a commissioned study conducted by Forrester Consulting on behalf of Dell and Intel.
Sponsored by: Dell
Upcoming Events