Google Cloud Fail Points To 2 Software Bugs - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud

Google Cloud Fail Points To 2 Software Bugs

The Google Compute Engine cloud service was hit with an 18-minute outage worldwide earlier in the week -- at the same time the supply chain is finally getting comfortable with continuity in the cloud.

7 Reasons To Convert To A Private Cloud
7 Reasons To Convert To A Private Cloud
(Click image for larger view and slideshow.)

Google issued an apology Wednesday for its cloud outage that swept from its Asia region through its entire global network over the course of an hour earlier in the week. The entire network was down for 18 minutes.

The company's Google Compute Engine (GCE), which allows users to create and run virtual machines on the Google cloud platform, started to route inbound traffic incorrectly in its "asia-east1" region on Monday at 6:25 pm PST. That resulted in dropped connections and the inability for users to reconnect.

The problem stemmed from Google engineers' efforts to remove an unused GCE IP block from its network configuration and propagate the new configuration throughout its global network. Although that task had been performed many times before without incident, a snag occurred when the configuration management software found an inconsistency in the new configuration, according to Google.

(Image: 4X-image/iStockphoto)

(Image: 4X-image/iStockphoto)

Instead of the usual fail-safe move of the system returning to the last known good configuration, an unforeseen bug in the software triggered the management software to remove all of the IP blocks from the new configuration. It then began to push the incomplete configuration throughout the entire global system.

Normally, a second safeguard measure ensures the system is running fine at a single site before a new configuration is pushed out to the next site. However, in this case, a second software bug that should have kept the problem contained at one site allowed it to progressively rollout through Google's entire cloud system worldwide.

By the time Google was able to resolve the rolling outage, an hour later, 95% of its cloud system was down.

Fortunately for users of its Google Cloud Storage, Google App Engine, and other Google Cloud Platform products, the outage did not affect them. It only affected the Google Compute Engine service.

"We take all outages seriously, but we are particularly concerned with outages which affect multiple zones simultaneously because it is difficult for our customers to mitigate the effect of such outages," Google said in a statement. "This incident report is both longer and more detailed than usual precisely because we consider the April 11 event so important, and we want you to understand why it happened and what we are doing about it."

Learn to integrate the cloud into legacy systems and new initiatives. Attend the Cloud Connect Track at Interop Las Vegas, May 2-6. Register now!

Over the next several weeks, Google's engineers will be working on prevention, detection and mitigation systems to develop additional safeguards, the company said.

Nonetheless, high-profile cloud outages like this come at an unfortunate time. Supply chain vendors, which have been the slowest adopters of the cloud, have finally started coming aboard over the last several years.

One of the two main concerns in making that decision was the ability of the cloud to provide continuity in its service, according to an Oracle survey cited in SupplyChainBrain.

As supply chain vendors adopted the cloud, adoption would typically begin with less business-critical operations, like human resources or enterprise resource planning (ERP). Eventually more business critical services would be added, according to the report.

In 2015, Oracle found that 80% of survey participants were running applications in the cloud, or were planning to make the move within the next 12 months, whereas the reverse was the case just three years earlier, SupplyChainBrain noted, citing the Oracle survey.

Past high-profile outages such as Amazon Web Services in September, apparently did not dissuade companies from turning to the cloud. Even after Google's latest snafu, the same may still hold true.

Dawn Kawamoto is an Associate Editor for Dark Reading, where she covers cybersecurity news and trends. She is an award-winning journalist who has written and edited technology, management, leadership, career, finance, and innovation stories for such publications as CNET's ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
10 RPA Vendors to Watch
Jessica Davis, Senior Editor, Enterprise Apps,  8/20/2019
Commentary
Enterprise Guide to Digital Transformation
Cathleen Gagne, Managing Editor, InformationWeek,  8/13/2019
Slideshows
IT Careers: How to Get a Job as a Site Reliability Engineer
Cynthia Harvey, Freelance Journalist, InformationWeek,  7/31/2019
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll