7 Data Center Disasters You'll Never See Coming

These are the kinds of random events that keep data center operators up at night. Is your disaster recovery plan prepared to handle these freak accidents?

Charles Babcock, Editor at Large, Cloud

June 7, 2015

9 Slides

(Image: toxawww via iStockphoto)

Flood, fire, solar flare, and crash by four-wheel-drive motor vehicle: These are the potential disasters that strike fear in the imaginations of data center operators, as the following accounts will show.

Jonathan Bryce, now executive director of the OpenStack Foundation, was the twenty-something founder of the Mosso Cloud in Dallas-Fort Worth when he found himself on the receiving end of such an incident on Dec. 18, 2009.

A diabetic driver in the vicinity of the Rackspace data center where Mosso was hosted had passed out behind the wheel of his SUV, crashing into a building housing the data center's electricity transformer equipment. Mosso was still running after the crash, but that was the start of a sequence of unlikely events that led to the service's outage.

How do you prepare for such an event in your disaster plan? "It's just one of those things you have to cope with as best you can," said Bryce.

This was a sentiment echoed last year by Robert von Woffradt, CIO for the State of Iowa, in a blog post after an unexpected fire in the state's primary data center. Survivors of the lower Manhattan office buildings and hospitals flooded by Hurricane Sandy in 2012 would agree.

Even if you think you're prepared for earthquake, flood and fire, when did you last worry adequately about the danger of solar flares? A powerful solar flare incident that could have disrupted electrical transmission systems missed the earth by a narrow margin in 2012. If the eruption had been only one week earlier, Earth would have been in the line of fire, Daniel Baker of the University of Colorado told NASA Science News in 2014. The flare's effects would have struck the earth's atmosphere, leading to heavy and unexpected voltage surges in electrical lines.

You might consider such a hazard as being extremely remote, but in 1859 a solar flare's disturbance known as the Carrington Event hit the earth and produced such large voltages that the wiring of telegraph offices sparked out of control, setting some offices on fire.

CIOs and data center managers who've been through a disaster have say that the best you can do is to prepare. "Test complete loss of systems at least once a year. No simulation; take them offline," advised Wolffradt, in a blog following the State of Iowa's crisis.

Check out our list of data center disasters -- from the scary to the outright outlandish -- and tell us about your own datacenter dramas in the comments section below.

(Image: @shbaik82 via Wikitree)

No, that's not the name of a new smartphone.

On April 20, 2014, a fire broke out in the middle of an office building in Gwacheon, South Korea. The fire had started in the Samsung SDS data center housed in the building. ZDNet Korea staff writer Jaehwan Cho posted images from the Yonhap News Agency on his Twitter feed @hohocho showing smoke and flames coming from the side of the building, with fierce heat causing debris to fall from the exterior.

The Samsung IT staff and occupants of the building were evacuated, with only one staffer suffering cuts, scratches and other minor injuries from falling debris, according to Data Center Knowledge.

The fire caused users of Samsung devices users, including smartphones, tablets, and smart TVs, to lose access to data they may have been trying to retrieve. Device users were denied access to content for several hours before recovery systems in a second Gwacheon data center could restore service, resulting in a blog of apology posted by Samsung officials.

(Image: Puget Sound Business Journal)

A fire in an electrical vault at Fisher Plaza in Seattle on July 3, 2009, knocked out the Authorize.net payment portal, Microsoft's Bing Travel service, the Geocaching.com service, the Dotster domain registrar service, and Web hosting provider AdHost, along with dozens of other sites. Power was restored the next morning.

The Puget Sound Business Journal reported that Geocaching and AdHost came back online as of 10 a.m., the next morning, while other services took longer to restore access. The fire, which apparently started in the burnt-out cabling duct (pictured above), was estimated to cost Fisher Communications $10 million in repairs and equipment replacement, according to the Puget Sound Business Journal.

(Image: Fog Creek Software)

Manhattan, like much of the East Coast, lost power as Hurricane Sandy came ashore over Virginia, Delaware, Maryland, and New Jersey in late October 2012. A storm surge of salt water followed, rushing up the streets and flooding the lower Manhattan and many other sites around the Tri-State area.

At 75 Broad Street in Lower Manhattan, home of Peer 1 hosting, it was a disaster recovery planner's nightmare. There were backup generators ready to go well above the water line on the building's 18th floor. But the same storm surge that poured into the building's lobby and filled its basement, knocked out the emergency generator fuel pumping system located there. Once under water, its electrical circuits no longer worked. (Part of New York's response to 9/11 was to limit the amount of fuel oil stored in office buildings.). So, as the generators began running out their limited supply of fuel, the company couldn't get fresh oil to them. Peer 1 advised customers of a planned shutdown of their systems within hours, as several employees made their way to the facility to help prevent any data loss.

Instead of a shutdown, Peer 1 engineered a bucket brigade to carry fuel oil for the generators. The fuel was lined up on the street, above, and carried by hand up to the up to the 17th floor, to where the generator's day tank was located. That tank and its pumps could deliver fuel to the generators on the floor above. Peer 1 customers -- including such as SquareSpace, a Web site development firm, and Fog Creek Software, a supplier of online project management -- provided manpower for the 25-member team carrying fuel oil up stairs to the generators the night of Oct. 30 and into Oct. 31.

By lunchtime on Oct. 31, they'd filled the day tank and could take a break, eating a lunch that had to be delivered by foot over the Brooklyn Bridge, due to Manhattan's clogged streets. Neither the need for a bucket brigade, nor on-foot delivery of lunch, had been put in the Peer 1 disaster recovery plan. But no shutdown resulted from the storm.

(Image: Mousepotato via iStockphoto)

Rackspace's managed hosting business and the embryonic Mosso Cloud running in the same Dallas data center were taken offline for several hours by an errant SUV on Nov. 13, 2007.

The driver of a large four-wheel drive vehicle, a diabetes sufferer, passed out behind the wheel. Instead of swerving to the edge of the street, the vehicle accelerated straight ahead, failed to turn at a T-intersection and jumped the curb to climb a grass berm on the far side. The berm served as a ramp that allowed the SUV to launch itself into the air over a row of parked cars. As it came down, it slammed into a building housing a power transformer for the Rackspace facility, knocking it out as a power source.

The building's cooling system came to a halt as a switching process linked up a secondary utility source of power. There was no interruption of processing, since the compute equipment continued running on the emergency batteries in place for just such an emergency. The facility's staff had initiated a restart procedure for the building's chillers when the utility, getting word that emergency crews were trying to extract a driver from a smashed vehicle embedded in live-feed transformer equipment, shut off all power to the facility, disrupting Rackspace's secondary utility source.

Again battery power kicked in and emergency generators started on cue, as called for by the disaster recovery plan. Data center processing had thus far not been interrupted, despite the accident and two losses of power from the grid. The multi-step startup process for the cooling system's large chillers, however, had been disrupted midway through the restart, and it proved impossible to get some restarted without further troubleshooting.

Rackspace president Lew Moorman told customers in a blog post soon after the incident that "two chillers did not restart, prompting the data center to overheat." The heat generated by the compute equipment was enough to send temperatures soaring, and Rackspace managers implemented "a phased equipment shut down lest equipment be damaged" and customer data lost.

The outage lasted until 10.50 p.m., five hours after the accident. Software-as-a-service provider 37signals, a company hosted by Rackspace, posted its own comment to its customers: "This 'perfect storm' chain of events beat both our set up and our data center's sophisticated back-up systems. We will work hard to further diversify our systems in order to make any future downtime event like this even more rare." In addition to increasing the risk of losing customers, the event was reported to have cost Rackspace $3.5 million in refunds.

(Image: Skeeze via Pixabay)

On Jan. 9, 2015, a large building that was to be a future Amazon.com data center caught fire as a welder's torch set fire to nearby building materials. The blaze turned into a three-alarm fire at the site in Ashburn, Va. Its thick plume of black smoke could be seen for several miles around. Amazon spokesmen told a local ABC News affiliate that the fire caused about $100,000 in damage, but added there was "no risk of impact on Amazon's operations" since the data center wasn't in service yet.

(Image: Skeeze via Pixabay)

While all of these scenarios will make even the most battle-scarred data center operator sweat, the good news is that the organizations highlighted here all managed to recover fairly quickly from a confluence of events that could never be anticipated in any disaster recover plan.

Have you lived through any close calls or freaky takedowns? What advice do you have for recovering from a disaster? And, what's your worst data center disaster nightmare? Tell us all about it in the comments section below.

(Image: Skeeze via Pixabay)

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

See more from Charles Babcock

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

7 Data Center Disasters You'll Never See Coming

About the Author(s)

Editor's Choice

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

<span class="ArticleBase-LargeTitle">7 Data Center Disasters You'll Never See Coming</span>7 Data Center Disasters You'll Never See Coming

About the Author(s)

Editor's Choice

7 Data Center Disasters You'll Never See Coming