An outage that took Facebook and Instagram off the air for an hour Monday affected 29 locations where Facebook operates servers. Curiously, its massive Prineville, Ore., data center complex appears to have remained in operation throughout the outage.
That means a problem arose in the content distributing sub-data centers that Facebook has scattered around the US and around the globe, in both its own and colocation data centers. A map produced by an Internet metrics collecting firm, Dynatrace, indicates 29 such locations had their operations interrupted for an hour, starting about 9:10 p.m. Pacific time.
That's just one of the conclusions an observer can make after examining data from Dynatrace, which tracks website performance for major retailers, financial services ecommerce systems, and online operations for hundreds of enterprises.
Dynatrace has 100 computers around the globe collecting data from "tens of thousands" of headless users, real-world end-users who allow their computers to periodically fire off stored queries to Nike, Netflix, and thousands of other online destinations. The client machines capture the response time and report it to Dynatrace. That allows it to report on application performance to their customers, which include Wells Fargo, LinkedIn, Cisco, Thomson Reuters, and Intuit.
Another conclusion is that the outage was not caused by a cyberattack, even though a group that wanted to claim credit started issued tweets claiming responsibility. Instead, Facebook 'fessed up to a configuration change gone awry.
"This was not the result of a third-party attack but instead occurred after we introduced a change that affected our configuration systems," according to Facebook's statement.
From its position astride the Internet, Dynatrace said the slowdown of sites that use the familiar Facebook link "Like this page," or are otherwise dependent on Facebook interactions, illustrates the vulnerability of businesses that rely on third-party links to their websites.
Vincent Geffray, a senior product manager at Dynatrace, said its Outage Analyzer service is a big data application sitting on top of the data routinely captured by its application performance management monitoring. Outage Analyzer spotted a slowdown Monday that was simultaneously occurring at the websites of Dynatrace customers and traced it back to their ties to Facebook. In some cases, a site allows a visitor to log in using his Facebook identity. In others it responds to a "like" recorded on the Dynatrace customer's site.
Dynatrace has 5,800 customers around the globe. Geffray said the Facebook slowdown occurred simultaneously around the globe. That suggests that the Facebook configuration change, the cited cause, may have been attempted to be implemented rapidly at several sites, spreading to other sites, or even implemented globally at the same time. The Dynatrace monitoring shows a sharp spike.
"We're working to get things back to normal as quickly as possible," Facebook spokeswoman Charlene Chian told CNN. Facebook visitors were not totally cut off from their favorite social media. "Sorry, something went wrong," they were told as they tried to access the site.
[Want to learn more about how a Microsoft code update brought down Azure? See Microsoft Azure Outage Blamed On Bad Code.]
In some cases, the inability of the end user's computer to finish building a full page meant that his or her interaction with a target site would be very slow or stall completely.
"Let's say Nike is slow because of Facebook. The customer doesn't know that the degradation is due to Facebook. He just says, 'Nike is slow,'" Geffray said.
Whatever the cause, Facebook rectified the issue within the hour, and sites began to recover normal operations. Facebook has had a strong reliability record on the whole. Its last major outage was five years ago and lasted for 2.5 hours.
Other social media provided a springboard to commenting on the situation. Twitter quickly spawned the hashtag, #facebookdown, where tweeters mocked themselves for not knowing what to do without being able to post selfies to Instagram or personal news to Facebook.
"are you kidding me? east bay emergency dispatch says 5 people called 911 during #facebookdown today!" tweeted Kristen Sze (@abc7kritensze).Reports that people were roaming the streets of Berkeley, shoving photos of themselves into strangers' faces, and asking if they "liked" them, were probably exaggerated.
Attend Interop Las Vegas, the leading independent technology conference and expo series designed to inspire, inform, and connect the world's IT community. In 2015, look for all new programs, networking opportunities, and classes that will help you set your organization’s IT action plan. It happens April 27 to May 1. Register with Discount Code MPOIWK for $200 off Total Access & Conference Passes.Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio