In the past, Amazon recommended customers keep their backup copies within an Amazon data center or regional complex of centers. It architected its regions into separate availability zones, each with its own power and communications. Even if a customer's primary zone failed, it would be able to recover in its backup zone, previous Amazon guidance said.
It didn't work out that way for some customers when Amazon's Northern Virginia data center froze up in April 2011 when an overloaded network prompted a "remirroring storm." The effort by many systems to create new copies of data when they could no longer see the original tied up Amazon's Elastic Block Store and Relational Database Services, preventing some customers from accessing their data or updating their websites.
That may be why Amazon is making the SunGard backup service available first to customers in its Northern Virginia center, also known as U.S. East-1, Amazon's highest traffic site. SunGard operates six high availability data centers, with one near U.S. East-1, in Philadelphia, and another in Mississauga, Ontario.
[ Want to learn more how some Amazon customers survived the outage? See Amazon Cloud Outage Proves Importance Of Failover Planning. ]
The new service is being made available as painlessly as possible, in part through Amazon's recently announced AWS Storage Gateway, a virtual appliance that sits on a customer's premises and directs data flows to their destination in the Amazon S3 storage service.
"A lot of organizations are not comfortable operating on only Amazon infrastructure, especially production apps," said Indu Kodukula, CTO of SunGard, at the SunGard Availability Services booth at Cloud Connect, a UBM TechWeb event this week in Santa Clara, Calif.
SunGard will leverage Amazon's Direct Connect service, which gives Amazon infrastructure increased privacy by offering private line, LAN-like connections between U.S. East-1 and the unnamed SunGard data center. The link gives a SunGard or Amazon customer "a bi-directional disaster recovery service, without transporting data over the public Internet," Kodukula said.
SunGard is looking to expand its high availability service offerings in 2012 and will give software development teams in its cloud access to Amazon's EC2 plain-vanilla infrastructure. Software development and test that occurs there would then be brought back to SunGard where it would be launched in a staging environment that duplicates its production environment. By staging new software with all its dependent systems before phasing it into production, development teams and operations teams can frequently find glitches and prevent a breakdown once it's launched in a production environment, Kodukula said.
"This is a complementary relationship that will have wide-reaching benefits for both organizations," said Kodukula of SunGard's closer working relationship with Amazon. That may be true, but Amazon had a pressing need to give its customers a recovery option outside its own data centers, and the link with SunGard allows it do so quickly.
Amazon customers who had accounts in both U.S. East-1 and U.S. West invented a way of recovering from Amazon's April 2011 slowdown. They re-engineered URLs on their domain name servers to get what had been a U.S. East workload to become one was directed to U.S. West. But not everyone was in a position to improvise on the fly. Many thought of it after it was too late and the public face of their businesses had suffered a slowdown or a freeze.
Running highly available systems in the cloud had proven more complicated than anyone thought. Terremark and Savvis, cloud suppliers owned by telecom companies, have started to advertise the fact that their data centers are linked together with direct lines and one can be used to back up workloads in another.
Amazon's pact with SunGard is an admission that highly available cloud operation is still a work in progress. Rather than delay any longer, it's embraced using a second cloud supplier for disaster recovery--and some customers will be relieved.
As enterprises ramp up cloud adoption, service-level agreements play a major role in ensuring quality enterprise application performance. Follow our four-step process to ensure providers live up to their end of the deal. It's all in our Cloud SLA report. (Free registration required.)