Global CIO: IBM's Bank Outage: Anatomy Of A Disaster
IBM personnel inadvertently triggered a 7-hour outage at Singapore's largest banking network last month by using unapproved procedures. Here's a detailed look at what went wrong.
At 2:58 a.m. on the morning of July 5, disaster struck for Singapore's largest banking network and a small IBM support team troubleshooting a communications problem between one of the bank's storage devices and its mainframe. Here's how the bank, DBS Group, and its IT services provider, IBM, described that harrowing moment, which occurred after 40 hours of on-and-off attempts to fix the problem:
"[The on-site IBM engineer] replaced the cable using the same procedures as before. This caused errors that threatened data integrity. As a result, the storage system automatically stopped communicating with the mainframe computer, to protect the data. At this point, DBS banking services were disrupted."
Disrupted, indeed: for the following seven hours—until 10:00 a.m.—DBS's customers were unable to access banking services via branches, ATMs, online, or mobile. One media report said DBS has about 1,000 ATMs and, early in 2008, had almost 1,000,000 online-banking customers.
And while bank CEO Piyush Gupta 8 days later issued a long and deeply apologetic letter to DBS Group's customers in which he took full responsibility for the outage and resulting customer inconvenience and loss of trust, IBM has also been embroiled in the controversy over what happened, why it happened, and how it can be prevented in the future.
What we do know at this point is that the outage will cost DBS Group a great deal more than the negative impact on customers as the government agency that oversees the banking industry, the Monetary Authority of Singapore, has ordered DBS to place $230 million in regulatory capital as a result of the outage.
What is not known—at least publicly—is whether IBM will have to compensate DBS for the cost of the outage and/or related costs. In two separate media reports from a joint press conference involving both DBS and IBM, both articles said the companies would not answer questions about that possibility.
According to ChannelNewsAsia.com's coverage of the press conference, IBM regional general manager Cordelia Chung said that "the personnel directly involved with this incident have been removed from direct customer support activity and disciplined" and that "IBM has taken steps to enhance the training of all related personnel on the most current procedures."
And the BusinessTimes.com.sg article quoted Chung as saying, "We have also taken steps to review installations of the same storage system at other financial institutions in Singapore for whom we provide maintenance services."
(If I may inject an opinion here: if I were DBS CEO Gupta, I'd have very mixed feelings about that preventive-maintenance approach taken by IBM not only on behalf of DBS's competitors but also on the shoulders of a very troublesome incident for DBS. That's why I would guess that while neither company would comment on whether IBM would be compensating DBS for its role in the crash, IBM's going to be paying DBS very generously in either cash or extended and comprehensive additional services. On top of that, the removal from the involved employees from future dealings with customers, plus disciplinary action against them, plus Chung's very public apologies to both DBS and its customers all seem to add up to a very uncomplicated admission by IBM of some level of culpability in the outage.)
The article goes on to quote Chung as describing IBM's top priority once they realized a crash had occurred:
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Digital Transformation Myths & TruthsTransformation is on every IT organization's to-do list, but effectively transforming IT means a major shift in technology as well as business models and culture. In this IT Trend Report, we examine some of the misconceptions of digital transformation and look at steps you can take to succeed technically and culturally.