Cloud Suppliers Quickly Patched Xen Bug - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud // Platform as a Service
News
10/3/2014
11:26 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Cloud Suppliers Quickly Patched Xen Bug

Amazon was fastest off the starting blocks to patch the Xen hypervisor bug; Rackspace and IBM SoftLayer soon followed.

Windows 10: 11 Big Changes
Windows 10: 11 Big Changes
(Click image for larger view and slideshow.)

Amazon Web Services, Rackspace, and IBM's SoftLayer have all rebooted a significant number of their cloud servers to patch a security flaw in the open source Xen hypervisor.

The bug was discovered by Jan Beulich, a software engineering consultant at Unix supplier SUSE, now part of Attachmate. She is a graduate of Moscow State University and works in Attachmate's Cologne, Germany, unit of Novell/SUSE.

IBM SoftLayer began notifying affected customers of potential downtime on Sept. 28, and started patching the bug at 3 p.m. UTC (Coordinated Universal Time) on Wednesday, Oct. 1. Xen open source project leaders at SUSE followed through on their plans to make the exploit public at noon UTC Oct. 1. That meant there were three hours of public exposure of the bug before SoftLayer began patching.

SoftLayer also couldn't patch all of its exposed systems at the same time. "Eliminating the vulnerability requires updating software on host nodes, and that requires downtime for the virtual servers running on those nodes. Yeah, that's not something anyone likes to hear. But customer security is of the utmost importance to us, so not doing it was not an option," SoftLayer said in a blog post.

"We are updating host nodes data-center-by-data-center to complete the emergency maintenance as quickly as possible. This approach will minimize disruption for customers with failover infrastructure in multiple data centers," wrote SoftLayer.

[Want to learn more about Amazon's Xen response? See Amazon Reboots Cloud Servers, Xen Bug Blamed.]

To avoid the possibility of taking down customers' systems running in different data centers at the same time, SoftLayer completed patching in one data center before moving on to the next. According to posts in SoftLayer's customer forums, the patching process was complete by Thursday at 10:19 AM UTC.

Amazon announced its patching plans Sept. 24 and executed them Sept. 26-30, completing the job on schedule before the exposure became public Oct. 1. It also updated one availability zone within a region before proceeding to the next. When the task was done, AWS evangelist Jeff Barr explained, "We couldn't be as expansive as we'd have liked on why we had to take such fast action. The zone-by-zone reboots were completed as planned and we worked very closely with our customers to ensure that the reboots went smoothly."

Rackspace realized a quarter of its 200,000 customers were affected and corrected the issue, taking running servers down over the weekend of Sept. 27 and 28 to repair them. It apologized to customers afterward for the lack of advance notification by emailing customers on Sept. 30 and posting the email by president and CEO Taylor Rhodes on its blog Oct. 1.

Rackspace OpenStack public cloud users were unaffected because their virtual machines are running under the KVM hypervisor. But the Rackspace Private Cloud includes many Xen users.

The nature of the bug, officially called Xen Security Announcement 108 or XSA-108, was obscure and it had not appeared in the wild prior to disclosure by SUSE's Beulich. At the same time, it would have been easy to exploit with limited coding skills and represented a potential for severe intrusion.

Advanced virtualization, unlike the earliest versions of the VMware ESX Server hypervisor, makes use of assists built into AMD and Intel x86 chips. The virtualization-award processors supply a shortcut to the hypervisor when it needs to access a specific device, such as a network interface card or other peripheral. In software-only virtualization, each instruction to the hardware in service to an application passes through the hypervisor. With virtualization-aware hardware, that step is sometimes bypassed to allow the instruction to go straight to a hardware component.

The bug in Xen was hidden in the code meant to work with those hardware assists for the Intel interrupt controller, a component on a Xeon or other x86 chip that can grant access to other parts of a server. Beulich discovered that a malicious coder, aware of the bug, could take advantage of it to

Next Page

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
10/6/2014 | 2:44:51 PM
A caution on live migrating your way to rebooting...
I've heard the question before, why don't they just use live migration? I don't think that's a layup. First of all there are different hypervisors in clouds and they don't all perform live migration the same way-- the ones that can do it. Second, you need a single storage file system underneath live migration, and I don't know all the ins and outs of that. I know it's possible to stretch a file system across two data centers and live migrate out of one into another, but it gets complicated and risky to do so. Start doing that between data centers or just between availability zones with multiple hypervisors and maybe it will work. Maybe it won't.
anon0464180519
50%
50%
anon0464180519,
User Rank: Apprentice
10/6/2014 | 9:47:53 AM
The bug could have been mitigated
Techniques like live migration or dynamic code patching could have mitigated the cloud-reboot issue.

Bugs always pop but it is high time that vendors will use modern way to get around them.

Check the osv.io/blog (this site forbids URLs) to read about it further

Dor
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
10/4/2014 | 11:27:03 AM
Re: Bad actors could read data in memory or crash the host
I am content how Amazon reacts to this critical bug. As world leading cloud provider, that's what it should do - patch the critical bugs on time to avoid causing loss to customers.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
10/3/2014 | 5:05:33 PM
There's a way to check Xen on-premises
Tenable Network Security CEO Ron Gula says his firm's Nessus product can check Xen hypervisors in enterprise data centers  (but not in the cloud) to see if they've been patched to thwart the Xsa-108 bug. Most enteprises are less compelled to do this than the cloud suppliers because they can easily control the applications running under Xen, while the cloud providers take workloads as they come through the door. The bug would be activated by an exploit buried in an application running under the Xen hypervisor. In most cases, Citrix, Red Hat or SUSE will provide the patch needed to update Xen in their products through the vendor's automated update service.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
10/3/2014 | 1:08:56 PM
Bad actors could read data in memory or crash the host
Taylor Rhodes' email to customers explained the nature of the Xen bug this way: "This particular vulnerability could have allowed bad actors who followed a certain series of memory commands to read snippets of data belonging to other customers, or to crash the host server."
Slideshows
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
Commentary
The Growing Security Priority for DevOps and Cloud Migration
Joao-Pierre S. Ruth, Senior Writer,  9/3/2020
Commentary
Dark Side of AI: How to Make Artificial Intelligence Trustworthy
Guest Commentary, Guest Commentary,  9/15/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Slideshows
Flash Poll