Government // Enterprise Architecture
News
12/30/2010
10:15 AM
Connect Directly
RSS
E-Mail
50%
50%

Skype Pegs Outage On Buggy Windows App

About 40% of the computers in the peer-to-peer VoIP network went offline in the day-long service disruption due to a flaw in an older version of Skype's software.

Skype says last week's daylong service outage that left millions of users unable to place voice or video calls was caused by a string of events that snowballed into the company's worst service disruption since 2007.

Skype's Wednesday "post-mortem" of the embarrassing snafu that started around 8 a.m. Pacific Dec. 22 showed the inherent weakness in the company's peer-to-peer communications network that relies on having a large number of subscribers' computers working. In the latest outage, overloaded support servers caused delayed responses that caused some computers running a version of Skype's proprietary Windows software to crash, setting off a chain reaction.

The software, version 5.0.0152, contained a flaw that prevented the application from processing the delayed response. Roughly half of all Skype users worldwide run the older version of the Windows application, which led to approximately 40% of the computers on the network going offline.

Among the crashing applications were from 25% to 30% of the computers Skype uses as "supernodes" on the network. These systems have the resources to act like phone directories that other computers use to make and receive calls. With so many supernodes out of commission, the load on the remaining supernodes spiked, which was exacerbated further when millions of Skype subscribers attempted to get back on the network.

"The initial crashes happened just before our usual daily peak-hour, and very shortly after the initial crash, which resulted in traffic to the supernodes that was about 100 times what would normally be expected at that time of day," Lars Rabbe, Skype's CIO, said in a blog post explaining the outage.

As a result of the overload, more supernodes shut down, increasing the loads on other systems, which also shut down, leading to the massive outage that left without service more than half of the 20 million-plus users who make calls during peak hours each day.

The outage lasted for about 24 hours. Skype brought the network back up gradually by deploying several thousand "mega-supernodes" to offload work from the supernodes in the peer-to-peer cloud. In order to get the system running, Skype had to siphon from resources normally used to support group video calling. As a result, that service was down for an additional day.

Skype is reviewing the way it provides automatic software updates to help ensure that more subscribers have the latest version. If more subscribers had been running the latest Windows application, version 5.0.0.156, then the outage might have been avoided. The company also will review its testing procedures to try to prevent flaws in future versions.

The end-of-year outage was the second service disruption for Skype this year and the worst since a 36-hour outage in August 2007. The latest snafu comes as Skype tries to boost capacity and network performance to impress Wall Street as the company prepares for an initial public offering.

Skype announced its IPO plans over the summer, but has yet to say when the stock launch would take place. In the meantime, the company has been working to beef up its paid services, particularly in the business market. The vast majority of Skype subscribers use their PCs to call each other free-of-charge. People who call landlines or mobile phones pay only pennies a minute.

Comment  | 
Print  | 
More Insights
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Howard Marks talks about steps to take in choosing the right cloud storage solutions for your IT problems
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.