Data Center Blackout In San Francisco Caused By A Bug
Backup generators at 365 Main failed to complete their start sequence because of a memory problem in the engine monitoring and control component.
When the lights went out in San Francisco last week, data center operator 365 Main's backup power generators also failed. Now the company has identified the cause of the problem: an engine monitoring and control component known as a Detroit Diesel Electronic Controller, or DDEC.
In a statement released Wednesday, 365 Main said that following the power outage last week, three of its 10 Hitec backup generators failed to complete their start sequence because of a memory problem in their DDECs.
"The team discovered a setting in the DDEC that was not allowing the component to correctly reset its memory," the company said in a statement. "Erroneous data left in the DDEC's memory subsequently caused misfiring or engine start failures when the generators were called on to start during the power outage on July 24."
In other words, the generators failed to start because of a bug.
A Detroit Diesel MTU spokesperson was not immediately available.
Detroit Diesel describes its DDEC as a tool to optimize engine performance and to simplify troubleshooting, electronic diagnostics, and data extraction.
Officials with 365 Main said the company has fixed the problem by "altering the timing of a command to the DDEC component, allowing more time between the engine shutdown command and the DDEC reset command."
Miles Kelly, VP of marketing for 365 Main, said that Hitec generators at his company's El Segundo, Calif., facility have the same DDEC. "Once we were able to diagnose the problem and test the fix, we deployed it here in San Francisco and Los Angeles," he said. "What Hitec is doing, having basically made this joint discovery with us, is they are going to be rolling out the fix to all other generators that are exposed to the bug."
Kelly said his company has been contacting other companies that use generators with this component.
Several prominent Web sites were knocked offline or experienced limited availability following the outage and the failure of 365 Main's backup power system, including AdBrite.com, CurrentTV.com, Craigslist.org, RedEnvelope.com, SecondLife.com, Six Apart's blog sites (LiveJournal.com, TypePad.com, Vox.com), Technorati.com, and Yelp.com.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.