Hurricane Sandy: Disaster Recovery Improv TalesHurricane Sandy: Disaster Recovery Improv Tales
In lower Manhattan, Peer1 Hosting used a bucket brigade to replenish fuel for diesel generators on the 18th floor after pumps and the elevator broke down. In New Jersey, SunGard rerouted fuel trucks to avoid flooded intersections.
November 1, 2012
Disaster preparedness is a well-known best practice in running a data center, but Hurricane Sandy is showing that in disasters, the unexpected happens. When it does, some disaster recovery plans turn out to have holes in them, while others may still require improvisation.
Even SunGard, a specialist in disaster recovery, had its own brush with disaster when a river levee broke in Carlstadt, N.J. It had three data centers in the nearby vicinity. They had been built on raised floors on what little high ground was available in the region and all three ended up avoiding the rising waters that crept up the margins of the site and into its parking lots. And then there was the issue with the fuel trucks. But more on that later.
The response that may win the Mayor's Office's resilience award, if not one for physical fitness as well, was the formation of a bucket brigade at the Peer 1 facility at 75 Broad in lower Manhattan to carry diesel fuel in five-gallon buckets and jerry cans from a tank at ground level to the 17th floor. From there it was pumped from a day tank into generators positioned atop the 18th floor.
Sabio Banducci, president and CEO of Peer 1 Hosting, said his firm had expected to have to shut down after "four feet of water filled the lobby" and infiltrated into the basement Monday evening, where the building's diesel fuel delivery system was located. That fuel distribution system failed as it was submerged in salt water.
Building engineers at 75 Broad attempted to implement a workaround piping system, but the building pump available wasn't powerful enough to move fuel oil up 17 stories. Peer 1's disaster recovery team, which had survived the 2003 power outage in Manhattan for days without a shutdown, thought ahead and contacted a firm with truck-mounted pumps. But with public transit shut down, city streets were clogged, and the firm wasn't able to move a truck to the Peer1 quickly enough to prevent a shutdown once the short term tanks at the generators ran dry.
Peer 1 customer, SquareSpace, a website development firm, notified its customers of a potential shutdown, noting Peer1's efforts: "Fuel and water-pumps are in short supply" in the city, it said in a posting on its website Tuesday.
Peer 1 posted a status update 10:45 a.m. Tuesday that it was "going to implement a controlled shutdown." But the shutdown never occurred.
Peer 1's generators were on the roof and not subject to flooding, provided they could find some way to get fuel to them. The Peer 1 data center manager and his team decided they could organize a squad to carry fuel by hand up the building's stairwell -- the elevators were out of service, of course. There the fuel could be poured into the day tank's distribution system. And with that, a latter day bucket brigade, not conceived of in the disaster recovery plan, was born.
Data center staff and other Peer 1 employees, plus some contractors, formed a team of 25 that worked deep into night of Oct. 30 and the morning of Oct. 31 to refill the 17th floor tank and keep the generators running. Customers who had seen the "controlled shutdown notice" arrived at the scene, thinking they might need to take extraordinary measures to conserve data. Instead they found a data center that was continuing to run, against the odds.
"Some of our customers came down believing they would have to power down, and instead they joined the bucket brigade," said Banducci.
A brigade of 25 people on the building's stairs lifted heavy, five-gallon containers up flights of stairs. One customer lending a hand was SquareSpace, a Manhattan website development firm. SquareSpace employees posted pictures of the operation on the firm's website Wednesday. Another was Fog Creek, an online project management firm for collaborative software development, located next door at 55 Broad.
Fuel trucks arrived intermittently, usually with eight fifty-gallon drums of diesel that were unloaded and painstakingly poured into five gallon containers at street level. The generators' appetite on the 18th floor proved relentless. Three different 25-man teams worked in shifts in the stairwell.
Banducci said a black humor emerged about how the company had engineered a self-improvement fitness program -- except for the sleep-deprivation. As he discussed the situation Wednesday evening, the generators were still running with a nearly full tank, and the data center had been up continuously. There was fuel to spare at ground level and no prospect of a shutoff. The workers had even been given 90 minutes off for lunch, earlier in the day. Other staffers brought lunch to the crews by foot over the Brooklyn Bridge, avoiding the city's clogged streets. That contingency hadn't been in the disaster recovery plan either.
Not everybody was so fortunate. Co-location and managed service provider Internap at the same location reported a different story. It too was prepared for a power outage with on-site generators and a back-up fuel supply. "As a result of the flooding, both our redundant fuel pumps and our generator fuel tank were compromised and shut down. The system continued to run until all fuel within the secondary feeder tanks were exhausted and our facility lost power" at 11:45 a.m. Tuesday, the company said in a notice.
Senior VP for development and operations Steve Orchard reported in an update Wednesday morning that the 75 Broad Street facility had been brought back online through restored fuel system and generator power. Customers systems were up and running, with 40 hours of diesel fuel on hand and a resupply truck available, he wrote.
A second Internap facility at 111 Eighth Street in lower Manhattan also had rooftop generators. But the "building-fed fuel system malfunctioned" and "pumps could not provide diesel to the rooftop generators, causing them to stop supplying power to our un-interruptible power supply system. Once battery backup was exhausted, our infrastructure lost power," Orchard explained in an Internap status update. That meant Internap customers, using the Eighth Street site as major connection point to the Internet, were off the Internet until power was restored. "We continue to work with vendors to bring the entire site back online," he wrote Wednesday.
No cause was listed but a building's fuel lines and pumps, which might function fine during periodic short term tests, could have debris build up in the line, a clogged filter or a failed a mechanical component, causing a failure during sustained, 24-hour operations. To anticipate such a development is hard, although more recently constructed buildings are built with instruments to report on the functioning of their systems that can help spot developing weak points. Even then, experienced building engineers need to be on hand to analyze and respond when a malady is identified. For whatever reason, that response wasn't available for the building's fuel system at 111 Eighth Street.
But even SunGard acknowledged there could be unforeseen eventualities that threaten to disrupt the best laid disaster recovery plan. Nick Magliato said he had recovery teams both on premises and on the ground with walkie talkies along the likely path of flooding as the storm surge drove what would have been a seasonal high tide to an even higher than usual mark.
Monday night the surge raised a nearby waterway 11 feet above flood stage but still within its earthen containment levees. The SunGard ground team found water leaking out and slowly creeping as a two-inch flood toward the industrial complex where SunGard's three data centers were housed. But it was rising slowly and not moving fast. Everyone hoped the tide would recede before causing any damage.
Then information from the ground teams, entered into a spread sheet at the data center where Magliato was located, indicated that the depth of the creeping flood had risen one-and-a half feet in ten minutes. The earthen levee had broken and water was flooding out. As that assessment came in, Magliato, said he realized "at that rate, it wasn't going to take very long before the whole place was going to be under water."
The observers were calling in reports from far enough away that the data center had advance warning. Still it wasn't long before water began appearing at the edge of the industrial park's parking lot, and then began to intrude upon them. Magliato still had the basic defense of having put all data center equipment on a raised floor above the level of the building foundation. The water level would have to increase a total of 6 feet to 7 feet before any of his operations would be affected and it never reached that height.
Nevertheless, the SunGard recovery teams found themselves working farther down their disaster recovery check lists than they ever had before at the Carlstadt location. The full executive team was at work and members of the board of directors were being alerted. Customers were alerted and placed on standing bridge conference calls. They would need to be kept abreast and might have to quickly approve transfer or shutdown procedures, if the water reached a certain mark.
It was at that point, Magliato said in an interview, that he realized his own disaster recovery plan had a gap in it. Fuel was needed to replenish the backup generators. As he looked at his map, Magliato realized the delivery truck's route would take it close to where the levee had broken and the water was the deepest. With myriad things to do, he and staffers nevertheless looked up an alternative route to ensure the truck did end up mired at some low lying intersection. Soon two 7,500-gallon trucks were continuously delivering fuel in shifts to keep the complex's 40,000 gallons of reserves stocked up.
The plan assumed "any flood water would recede by the time the replacement fuel was needed," but that turned out not to be the case. "We had to think of a path from another direction for them to come on," Magliato recalled.
High tides along coastal areas occur twice a day and the water lingers for varying periods -- usually an hour or less at the high water mark before receding and becoming a slack tide. For Magliato and his teams, "it was long high tide. We didn't see any significant decrease in the water level until lunchtime" Tuesday, he said.
On the whole, the disaster plan worked as expected and the SunGard facilities continued operating continuously. But afterward, no one claimed that the plan had foreseen every challenge or didn't require a little improvisation along the way.
About the Author(s)
You May Also Like