5 Years Later - Lessons From The Blackout?
Yesterday was the 5th anniversary of the biggest electrical blackout in North American history. Some 50,000,000 people from Ohio to D.C. to Ontario (Canada, not California) were without power for up to four days. The mainstream media is covering the big picture and lessons the power industry can learn to make the grid more resistant to trees knocking down power lines. I wanted to take the opportunity to address the questions this event raises for IT.
Yesterday was the 5th anniversary of the biggest electrical blackout in North American history. Some 50,000,000 people from Ohio to D.C. to Ontario (Canada, not California) were without power for up to four days. The mainstream media is covering the big picture and lessons the power industry can learn to make the grid more resistant to trees knocking down power lines. I wanted to take the opportunity to address the questions this event raises for IT.While I'm not generally a fan of planning for specific disasters, because Murphy's Law says if you prepare for wildfires you'll get hit with mudslides or earthquakes instead, the example of the 2003 blackout does raise some questions your disaster recovery plan will have to address.
First is distance. Various factors, like the effect of distance on network latency, and with synchronous replication and therefore application performance, and the convenience of having someone drive out to the DR site to install a new server or memory upgrade, lead most organizations to keep their DR site within a reasonable driving distance of their primary site. Here in New York that usually means across a river in New Jersey, upstate (yes, for a real New Yorker, White Plains is upstate), or Connecticut. All of which were blacked out.
So the question comes up, is greater distance -- to, say, Nevada -- worth the cost and trouble?
The second is when to declare a disaster and activate the DR site. The blackout struck New York at 4:11 p.m. on a Thursday. Most New York City locations had power restored by morning. Organizations that didn't activate their DR sites, and being in a NY office tower that didn't have a generator, had to restart their servers. The IT guys either had a really long night or the users had a slow morning.
Those that did activate not only had to limp through Friday running from the DR site, but probably had to spend the weekend failing back from the DR site to their primary systems. The fail-back process is rarely as well-thought-out or tested as the fail over, so I'm sure it was a long weekend for some folks.
Would you have declared a disaster at 4:11? At midnight? Or would you just consider Friday to be like a snow day and postpone the decision till Sunday, knowing blackouts rarely last more than a day?
More important, do you have a process as part of the DR plan?
About the Author
You May Also Like