When Facebook's Down, Thousands Slow Down

When Facebook went down this week, thousands of websites linked to the social media site also slowed down, according to Dynatrace.

Charles Babcock, Editor at Large, Cloud

January 28, 2015

5 Min Read
(Image: Wikipedia)

 Cloud Storage Devices: 6 Worth Your Money

Cloud Storage Devices: 6 Worth Your Money

Cloud Storage Devices: 6 Worth Your Money (Click image for larger view and slideshow.)

An outage that took Facebook and Instagram off the air for an hour Monday affected 29 locations where Facebook operates servers. Curiously, its massive Prineville, Ore., data center complex appears to have remained in operation throughout the outage.  

That means a problem arose in the content distributing sub-data centers that Facebook has scattered around the US and around the globe, in both its own and colocation data centers. A map produced by an Internet metrics collecting firm, Dynatrace, indicates 29 such locations had their operations interrupted for an hour, starting about 9:10 p.m. Pacific time.

As a result, at least 7,500 Web sites that depend on a JavaScript response from a Facebook server had their operations slowed or stalled by a lack of response from Facebook. Of course Facebook users, who could access the service, couldn't get it to respond or do anything for them during the hour.

That's just one of the conclusions an observer can make after examining data from Dynatrace, which tracks website performance for major retailers, financial services ecommerce systems, and online operations for hundreds of enterprises.

Dynatrace has 100 computers around the globe collecting data from "tens of thousands" of headless users, real-world end-users who allow their computers to periodically fire off stored queries to Nike, Netflix, and thousands of other online destinations. The client machines capture the response time and report it to Dynatrace. That allows it to report on application performance to their customers, which include Wells Fargo, LinkedIn, Cisco, Thomson Reuters, and Intuit.  

Another conclusion is that the outage was not caused by a cyberattack, even though a group that wanted to claim credit started issued tweets claiming responsibility. Instead, Facebook 'fessed up to a configuration change gone awry.

"This was not the result of a third-party attack but instead occurred after we introduced a change that affected our configuration systems," according to Facebook's statement.

From its position astride the Internet, Dynatrace said the slowdown of sites that use the familiar Facebook link "Like this page," or are otherwise dependent on Facebook interactions, illustrates the vulnerability of businesses that rely on third-party links to their websites.

Vincent Geffray, a senior product manager at Dynatrace, said its Outage Analyzer service is a big data application sitting on top of the data routinely captured by its application performance management monitoring. Outage Analyzer spotted a slowdown Monday that was simultaneously occurring at the websites of Dynatrace customers and traced it back to their ties to Facebook. In some cases, a site allows a visitor to log in using his Facebook identity. In others it responds to a "like" recorded on the Dynatrace customer's site.

Dynatrace has 5,800 customers around the globe. Geffray said the Facebook slowdown occurred simultaneously around the globe. That suggests that the Facebook configuration change, the cited cause, may have been attempted to be implemented rapidly at several sites, spreading to other sites, or even implemented globally at the same time. The Dynatrace monitoring shows a sharp spike.

"We're working to get things back to normal as quickly as possible," Facebook spokeswoman Charlene Chian told CNN. Facebook visitors were not totally cut off from their favorite social media. "Sorry, something went wrong," they were told as they tried to access the site.

For retail and enterprise sites that use Facebook as a third-party service, however, the incident took on serious consequences. According to Dynatrace, the short delays that started to show up around 9:10 p.m. PT grew into 39-second delays before a "server not available" or other message was returned to users. The retailers and other businesses were available, but their full pages couldn't move to the next user interaction until the Facebook link finished loading its JavaScript.

[Want to learn more about how a Microsoft code update brought down Azure? See Microsoft Azure Outage Blamed On Bad Code.]

In some cases, the inability of the end user's computer to finish building a full page meant that his or her interaction with a target site would be very slow or stall completely.

"Let's say Nike is slow because of Facebook. The customer doesn't know that the degradation is due to Facebook. He just says, 'Nike is slow,'" Geffray said.

The problem exists with any social media service or other third party tied into a website's operation. If the full document object model called for by the download can't be built, due to absent JavaScript, the download may fail. Most websites are built with such interdependencies today. Their owners aren't always aware of the ways a third party might be slowing down the site.

Whatever the cause, Facebook rectified the issue within the hour, and sites began to recover normal operations. Facebook has had a strong reliability record on the whole. Its last major outage was five years ago and lasted for 2.5 hours.

Other social media provided a springboard to commenting on the situation. Twitter quickly spawned the hashtag, #facebookdown, where tweeters mocked themselves for not knowing what to do without being able to post selfies to Instagram or personal news to Facebook.

"are you kidding me? east bay emergency dispatch says 5 people called 911 during #facebookdown today!" tweeted Kristen Sze (@abc7kritensze).Reports that people were roaming the streets of Berkeley, shoving photos of themselves into strangers' faces, and asking if they "liked" them, were probably exaggerated.

Attend Interop Las Vegas, the leading independent technology conference and expo series designed to inspire, inform, and connect the world's IT community. In 2015, look for all new programs, networking opportunities, and classes that will help you set your organization’s IT action plan. It happens April 27 to May 1. Register with Discount Code MPOIWK for $200 off Total Access & Conference Passes.

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights