7 Data Center Disasters You'll Never See Coming - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud
News
6/7/2015
12:06 PM
Charles Babcock
Charles Babcock
Slideshows
Connect Directly
Twitter
RSS
E-Mail

7 Data Center Disasters You'll Never See Coming

These are the kinds of random events that keep data center operators up at night. Is your disaster recovery plan prepared to handle these freak accidents?
6 of 9

Outage By SUV
Rackspace's managed hosting business and the embryonic Mosso Cloud running in the same Dallas data center were taken offline for several hours by an errant SUV on Nov. 13, 2007. 
The driver of a large four-wheel drive vehicle, a diabetes sufferer, passed out behind the wheel. Instead of swerving to the edge of the street, the vehicle accelerated straight ahead, failed to turn at a T-intersection and jumped the curb to climb a grass berm on the far side. The berm served as a ramp that allowed the SUV to launch itself into the air over a row of parked cars. As it came down, it slammed into a building housing a power transformer for the Rackspace facility, knocking it out as a power source. 
The building's cooling system came to a halt as a switching process linked up a secondary utility source of power. There was no interruption of processing, since the compute equipment continued running on the emergency batteries in place for just such an emergency. The facility's staff had initiated a restart procedure for the building's chillers when the utility, getting word that emergency crews were trying to extract a driver from a smashed vehicle embedded in live-feed transformer equipment, shut off all power to the facility, disrupting Rackspace's secondary utility source.
Again battery power kicked in and emergency generators started on cue, as called for by the disaster recovery plan. Data center processing had thus far not been interrupted, despite the accident and two losses of power from the grid. The multi-step startup process for the cooling system's large chillers, however, had been disrupted midway through the restart, and it proved impossible to get some restarted without further troubleshooting. 
Rackspace president Lew Moorman told customers in a blog post soon after the incident that 'two chillers did not restart, prompting the data center to overheat.' The heat generated by the compute equipment was enough to send temperatures soaring, and Rackspace managers implemented 'a phased equipment shut down lest equipment be damaged' and customer data lost. 
The outage lasted until 10.50 p.m., five hours after the accident. Software-as-a-service provider 37signals, a company hosted by Rackspace, posted its own comment to its customers: 'This 'perfect storm' chain of events beat both our set up and our data center's sophisticated back-up systems. We will work hard to further diversify our systems in order to make any future downtime event like this even more rare.' In addition to increasing the risk of losing customers, the event was reported to have cost Rackspace $3.5 million in refunds.
(Image: Mousepotato via iStockphoto)

Outage By SUV

Rackspace's managed hosting business and the embryonic Mosso Cloud running in the same Dallas data center were taken offline for several hours by an errant SUV on Nov. 13, 2007.

The driver of a large four-wheel drive vehicle, a diabetes sufferer, passed out behind the wheel. Instead of swerving to the edge of the street, the vehicle accelerated straight ahead, failed to turn at a T-intersection and jumped the curb to climb a grass berm on the far side. The berm served as a ramp that allowed the SUV to launch itself into the air over a row of parked cars. As it came down, it slammed into a building housing a power transformer for the Rackspace facility, knocking it out as a power source.

The building's cooling system came to a halt as a switching process linked up a secondary utility source of power. There was no interruption of processing, since the compute equipment continued running on the emergency batteries in place for just such an emergency. The facility's staff had initiated a restart procedure for the building's chillers when the utility, getting word that emergency crews were trying to extract a driver from a smashed vehicle embedded in live-feed transformer equipment, shut off all power to the facility, disrupting Rackspace's secondary utility source.

Again battery power kicked in and emergency generators started on cue, as called for by the disaster recovery plan. Data center processing had thus far not been interrupted, despite the accident and two losses of power from the grid. The multi-step startup process for the cooling system's large chillers, however, had been disrupted midway through the restart, and it proved impossible to get some restarted without further troubleshooting.

Rackspace president Lew Moorman told customers in a blog post soon after the incident that "two chillers did not restart, prompting the data center to overheat." The heat generated by the compute equipment was enough to send temperatures soaring, and Rackspace managers implemented "a phased equipment shut down lest equipment be damaged" and customer data lost.

The outage lasted until 10.50 p.m., five hours after the accident. Software-as-a-service provider 37signals, a company hosted by Rackspace, posted its own comment to its customers: "This 'perfect storm' chain of events beat both our set up and our data center's sophisticated back-up systems. We will work hard to further diversify our systems in order to make any future downtime event like this even more rare." In addition to increasing the risk of losing customers, the event was reported to have cost Rackspace $3.5 million in refunds.

(Image: Mousepotato via iStockphoto)

6 of 9
Comment  | 
Print  | 
Comments
Newest First  |  Oldest First  |  Threaded View
batye
50%
50%
batye,
User Rank: Ninja
7/2/2015 | 12:33:42 AM
Re: Check your generators!
@kbartle803 interesting to know... thanks for sharing... in my books you could never be prepared 100%... 
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/11/2015 | 3:15:00 PM
A narrow margin separates "chilled" from "too hot"
In no. 6, Outage by SUV, a commenter on Lew Moorman's blog post noted that a data center has about five minutes between the loss of its chillers and the start of equipment overheating. Does anyone know, is the margin really that narrow? I understand that computer equipment can operate at up to 100 degrees OK, but after that overheating starts to get dicey.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/11/2015 | 3:02:16 PM
Diesel fuel stored at NYC data centers reduced by 9/11
KBartle, what about this? One of the unreported aspects of the Hurricane Sandy disaster, when New York and many places along the East Coast went dark, was that every data center in the city had a limited supply of diesel fuel on premises. That was due to new regulations, I believe from a former mayor's office after 9/11, that the flamable liquids stored inside an office building must be reduced. In some cases, that made the investment in generators irrelevant. Public transit was down, city streets were clogged and fuel delivery trucks had great difficulty getting through. There goes the disaster recovery plan.
kbartle803
50%
50%
kbartle803,
User Rank: Apprentice
6/10/2015 | 3:07:13 PM
Check your generators!
I was working at a datacenter in California that had power feeds from three different utilities, redundant battery backup, and a generator.  All three utilities went down when the same source all three were using failed.  We went to battery backup until the generator took over, it ran for about an hour until it overheated because the cooling system was rusted and clogged.  The utilities were still down, so we ran on batteries for another hour until we finally went dark.
Dave Kramer
50%
50%
Dave Kramer,
User Rank: Apprentice
6/9/2015 | 10:21:00 AM
Re: Move the backup site further away!
If I recall, the new data centers were New York (HQ), Houston, Seattle - but now realizing how hurricanes could still wipe out New York/Houston, at least Seattle might be safe from hurricanes but not Earth Quakes!?!  Maybe something central like Colorado or New Mexico where the environmental/natural diasters is less likely might be a safe bet! I'm located mid west, Saskatchewan Canada, and we've been hit with flooding in the last few years but in the lower lying parts of the Province.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/8/2015 | 9:07:39 PM
Move the backup site further away!
Dave Kramer, yes, it's a good idea to move the backup data center to a different site. But Hurricane Sandy told us just how far away that second site might have to be. Moving it across town or across the state might not have been enough in that case. With Sandy, disaster recovery specialist Sungard, had flood waters lapping at the edges of its parking lots on the high ground in N.J. The advent of disaster recovery based on virtual machines makes it more feasible to move recovery to a distant site (but still doesn't solve all problems).
Dave Kramer
100%
0%
Dave Kramer,
User Rank: Apprentice
6/8/2015 | 4:58:07 PM
Re: Data Center Disasters
We were dealing with a large corporation that had it's own data center backup in the second World Trade Tower in New York. So when the 9/11 disaster struck it wiped out both data centres. 

Their new data centers six months later had their second and third backups in various other cities spread between far flung different States. Unfortunately it took such a drastic tragedy to make a new policy of not allowing a backup data center to even be within the same State - which is probably a wise move overall.

 
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/8/2015 | 12:48:52 PM
When the fire fighting system gets triggered by accident....
DanaRothlock, Yes, part of the problem of disaster preparedness is preventing the fire fighting system, especially when it's triggered by accident, from destroying what it's supposed to save. There's been no easy answer for years. Halon was meant to prevent water damage to the equipment. Sprinklers, on the other hand, prevent Halon damage. It's a fool's bargain with fate.
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
6/8/2015 | 10:56:52 AM
Re: Data Center Disasters
This kind of accident is rare but we need to be prepared for the possible occurrence. At least for the hosts in one cluster, they should not be located in the same building, or at least the same rack.
DanaRothrock
50%
50%
DanaRothrock,
User Rank: Apprentice
6/8/2015 | 4:00:20 AM
Data Center Disasters
I know of a couple data center meltdowns.

One was a lightning bolt that burned a one-inch hole in the side of the mainframe.


Another was Halon discharge in the computer room due to cigarette fire in trash can.  The Halon destroyed all the disk drives for mainframe systems.  Halon was then replaced by water sprinklers for big savings. 
News
How to Create a Successful AI Program
Jessica Davis, Senior Editor, Enterprise Apps,  10/14/2020
News
Think Like a Chief Innovation Officer and Get Work Done
Joao-Pierre S. Ruth, Senior Writer,  10/13/2020
Slideshows
10 Trends Accelerating Edge Computing
Cynthia Harvey, Freelance Journalist, InformationWeek,  10/8/2020
White Papers
Register for InformationWeek Newsletters
The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
Video
Current Issue
[Special Report] Edge Computing: An IT Platform for the New Enterprise
Edge computing is poised to make a major splash within the next generation of corporate IT architectures. Here's what you need to know!
Slideshows
Flash Poll