Google is promising to recover lost messages and to restore service for affected users soon.
(click image for larger view)
Slideshow: Top 10 Google Stories Of 2010
Google on Monday evening reported that it had identified why a small percentage of Gmail users have been unable to access their messages since Sunday: buggy software.
"We released a storage software update that introduced the unexpected bug, which caused 0.02% of Gmail users to temporarily lose access to their e-mail," said Google engineering VP Ben Treynor in a blog post. "When we discovered the problem, we immediately stopped the deployment of the new software and reverted to the old version."
Unstable software in one of Google's European data centers caused a less lengthy service disruption in 2009. In the wake of that outage, Google introduced its App Status Dashboard to give customers more visibility into its operations.
Google has promised to remedy the situation as soon as possible, but that's taking longer than usual because the flaw wiped out affected customers' e-mail in multiple data centers, forcing Google's engineers to restore accounts from backup tapes. Restoring data from tapes rather than a nearby data center takes hours instead of milliseconds, explained Treynor.
The App Status Dashboard on Tuesday indicates that the recovery operation is ongoing. The message posted at 8:10 am says, "Google Mail service has already been restored for some users, and we expect a resolution for all users in the near future. Please note this time frame is an estimate and may change. At the moment, we are working on restoring the affected accounts. Once the restore is complete, we will start reinstating accounts and delivering messages."
Google has declined to provide an absolute number of users affected. Based on recent estimates that the Gmail user base has reached 170 million worldwide, 0.02% works out to about 34,000 users.
Google did not immediately respond to a request to provide information about whether any users covered by the Google Apps for Business service level agreement will receive credit for downtime.
Update: At 12:20 pm PST, Google amended its blog post about the status of its recovery efforts.
"Data for the remaining 0.012% of affected users has been successfully restored from tapes and is now being processed," the update says. "We plan to begin moving data into mailboxes in 2 hours, and in the hours that follow users will regain access to their data. Accounts with more mail will take more time. Thanks for bearing with us."
Google also confirmed that for the very small number of business customers affected, SLA credit will be issued. The company plans to publish an incident report with additional details in the next day or two.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.