Amazon Cloud Outage Didn't Stop Recovery.gov - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud

Amazon Cloud Outage Didn't Stop Recovery.gov

A forward-thinking continuing operations strategy allowed for seamless failover that kept the site running, despite problems with EC2.

Top 20 Government Cloud Service Providers
(click image for larger view)
Slideshow: Top 20 Government Cloud Service Providers
Careful planning kept the Recovery.gov site online despite an outage at its cloud provider, Amazon Web Services (AWS), that began last Thursday, according to a federal official.

Recovery.gov--which the Recovery Accountability and Transparency Board (RATB) moved to the AWS cloud a year ago--was unaffected by the outage and remained online without incident, said Mike Wood, the executive director of RATB.

The board, created in February 2009 with the passage of the American Recovery and Reinvestment Act, is responsible for overseeing the spending of the $781 billion stimulus package, and Recovery.gov is a transparency site that allows the public to see where the money is being spent. It was the first government-wide system to move to a cloud computing infrastructure, as well as the first to run on Amazon Elastic Compute Cloud (EC2).

Availability was a key consideration when moving Recovery.gov to the cloud, Wood said. It was designed so if the AWS zone of cloud infrastructure that powers the site fails, its resources would automatically be shifted over to another zone.

"It's a good news story for us," he said. "It all happened seamlessly and it all worked."

Wood declined to comment which AWS data center powers Recovery.gov for security reasons, but confirmed it is one of the U.S. locations. AWS has data centers across the United States and in Europe; the one that experienced the outage was its Northern Virginia data center.

Wood added that the Recovery Board used a combination of third-party and custom software as part of its continuing operations strategy to facilitate the resource shift in case of an outage.

According to the AWS Service Health Dashboard, the outage, which began April 21, had largely been cleared up by Monday. However, some sites using that center could still be affected by lingering performance issues, according to the dashboard.

"Public-facing" Department of Treasury websites that also use AWS, including Treasury.gov, MyMoney.gov, FinancialStability.gov, and MakingHomeAffordable.gov, also remained online during the outage, according to a spokesperson from Smartronix, the contractor that worked on both the Recovery.gov move and the Treasury cloud project.

At least one other federal website was not so lucky. The Department of Energy's OpenEI.org site was unavailable for nearly two days, according to a report. The site allows the public to participate in clean energy research.

The EC2 outage has cloud-computing critics and skeptics raising the flag about leveraging the public cloud for mission-critical websites and other IT resources.

RATB's Wood said that while he continues to trust in cloud computing, the outage provides a good lesson for future cloud implementations.

"In theory you think of the cloud as having very good availability," he said. "But like any technology, it's not perfect and it's never going to be. The failsafe would be to put some software in place that would allow you to roll over seamlessly."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
10 Ways to Prepare Your IT Organization for the Next Crisis
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/20/2020
News
IT Spending Forecast: Unfortunately, It's Going to Hurt
Jessica Davis, Senior Editor, Enterprise Apps,  5/15/2020
Commentary
Helping Developers and Enterprises Answer the Skills Dilemma
Joao-Pierre S. Ruth, Senior Writer,  5/19/2020
White Papers
Register for InformationWeek Newsletters
The State of IT & Cybersecurity Operations 2020
The State of IT & Cybersecurity Operations 2020
Download this report from InformationWeek, in partnership with Dark Reading, to learn more about how today's IT operations teams work with cybersecurity operations, what technologies they are using, and how they communicate and share responsibility--or create risk by failing to do so. Get it now!
Video
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Slideshows
Flash Poll