Meeting The Capacity Challenge
Companies scramble to keep up with demand as more viewers access their Web sites
It's pretty easy to create a Web site. But building and managing the infrastructure to ensure that there's enough capacity to handle the traffic can be difficult. For some companies, the answer is to outsource the entire Web operation to a hosting company that can provide additional servers and bandwidth to handle spikes in demand.
But many businesses are unwilling to let others run their Web sites, and they've adopted a variety of tactics to make sure their sites are available, regardless of the number of visitors.
"You have to do a lot of planning and be able to react really quickly because you're really not in control of the usage," says Ian Beer, director of IT at NetUpdate Inc., an application service provider for the financial-services industry.
The ASP started small. "We had two servers--one Web server and one database server--and at the time, even that was overkill," says chief technology officer Doug Clawson. Now NetUpdate has dozens of clients and more than 30,000 user accounts of individuals involved in all stages of mortgage processing.
Despite the rapid growth, the ASP has cut the time it takes for a Web page to load from 16 seconds to 4 seconds by increasing the number of servers it uses and ensuring that none of them exceeds 60% sustained processor utilization. In addition to one live Web system co-located at one hosted facility, NetUpdate has another unused Web system for backup and is preparing in the next six months to add a second live hosted system that will run in tandem with the original one. The cost: about $1.2 million for networking hardware, front- and back-end servers, and software.
The IT and Web-site staff at another ASP, Ceridian Corp., which offers outsourced human-resources applications to its corporate clients, knew they had to reach certain performance goals in order for the Web-based applications to be profitable to the company.
David Smith of Ceridian |
"We had goals that were set for us," says David Smith, VP of research and development at Ceridian in Minneapolis. The company was aware that, to be profitable, "we had to meet certain performance goals with the servers," Smith says. That's because Ceridian charges its customers a set amount per month for each employee serviced, and it depends on being able to provide its services at a monthly cost of slightly less than what it charges, Smith says. Ceridian's goal was to be able to support 250 simultaneous users per server. But when the company instituted testing of its systems last year with E-Load software from Empirix, it found that it could only support 10 simultaneous users per server before the system crashed, Smith says. By fixing some basic problems that Smith calls "low-hanging fruit," Ceridian was able to squeeze enough performance out of its servers to support 50 users at a time. Those fixes included plugging memory leaks, optimizing code, and removing some third-party software that was contributing to crashes. As testing and even more fine-tuning of the system progressed, Ceridian advanced to 100 users per server, and then to 200, and, finally, after streamlining the site's self-publishing system and customization features for individual customers, reached 300 simultaneous users per server. Now, Ceridian has about 10 corporate customers that represent about 70,000 individual employees and supports them with a Web infrastructure that includes a Cisco load-balancing switch and three Compaq dual-processor servers each with 1 Gbyte of RAM. The company calculates that about 1% of the 70,000 employees it serves will use the system at any given time. But, Ceridian continually load tests the system, especially before modifying it in any way, but also to make sure it can handle any spikes in usage, Smith says. For instance, Ceridian is rolling out Source Self Service as its latest and most advanced service. It lets individual employees go online to check on payroll and other benefits information directly from the Ceridian site. With the service, employees get an E-mail on paydays saying their paycheck stubs are viewable online, and "when that E-mail goes out, everybody's going to be hitting that link at the same time," Smith says. Open enrollment periods for health-care coverage and other benefits also result in unusually high levels of traffic that Ceridian must be prepared to accommodate, he says. Merging Problems
Scalability challenges can arise from a variety of factors, including mergers and acquisitions. For example, early this year, The Boston Herald bought the Community Newspaper Co., adding 1 million page hits a month to The Herald's Web site and doubling the newspaper's Web traffic.With the acquisition, the number of separate Boston Herald Web sites increased from a handful to a dozen, says Bill Gassney, executive technical producer at Herald Interactive Advertising Systems Inc., the online and interactive subsidiary of Herald Media Inc., the company formed by the merger. "We moved very quickly to offset this bloat and get our sites running in an optimal fashion," Gassney says. The company divided the effort into three distinct phases, he says.The first phase involved removing graphical elements from the company's Web pages so pages would load more quickly. "We were getting complaints to customer service that the site was running too slow," Gassney says.In the second phase, now partially complete, Herald Interactive Advertising Systems added bandwidth to the site by buying a 20-Mbps Internet circuit from nearby services provider Yipes Communications Inc. The company also caches its content using a caching service from Mirror Image Internet Inc. so content is served up to Web surfers from the closest cache, Gassney says. The third phase of the project calls for the company to consolidate all its Web servers centrally in its Needham, Mass., headquarters.Seasonal Stress
Other challenges that can strain the capacity of a Web system are seasonal fluctuations in traffic and concentrated marketing campaigns that generate a flood of Web traffic.Combine the two and traffic can skyrocket. MuseumShop.com Inc. late last year saw its average traffic load increase from about 1 million unique Web visitors a month to more than 8 million a month during its peak season. The Arlington, Mass., company does 70% of its annual business in the last eight weeks of the year, and last year its peak season coincided with a big radio and print marketing campaign, CEO Mitchell Massey says. As the traffic started to roll in, "we found ourselves with more volume than we could easily handle," Mitchell says. Before the flood, MuseumShop had two Compaq 550 servers running its site; the company quickly added two more.The new servers didn't help much, Mitchell says. Even with them, the company's infrastructure was barely able to keep up with all the new traffic, since each additional server added only about 25% more capacity to the site, he says.That's when MuseumShop began looking for alternatives. "The whole idea was to get out of owning servers and hosting content," so it decided that caching most of its content in a caching provider's network would be more efficient and cost effective, he says.Today, even as it gears up for another year-end traffic increase, the online retailer has gone back to its original two servers, which it uses for delivering dynamic information such as prices and for securely processing transactions. Massey estimates that three-quarters of its content is hosted by its caching provider; the remainder is kept in-house.North Carolina State University has its own scalability challenges, brought on by rising demand for Web-content applications and the changing habits of college students, says Harry Nicholos, director of IT at the university in Raleigh.It's especially challenging because the university's Web systems have to be available around the clock to meet the rise of less-traditional student schedules and other trends such as online coursework, Nicholos says.Every semester, "the first two weeks of class and the last two weeks of class are brutal," Nicholos says. During those weeks, the number of page hits on the university's 50 or so Web sites increase from an average of 1.2 million a day to as many as 3.5 million a day, he estimates.Nicholos says the university relies on two tools, in particular, to keep its sites performing efficiently. One is Commander Solutions from Resonate Inc., a load-balancing and service-level-management product that lets the university wring the most performance out of its servers. The other, the Perspective performance measurement service from Keynote Systems Inc., measures Web-page response times, Nicholos says.The load-balancing tool lets the university create farms of servers that can deliver URL requests to the server most capable of handling it at that given moment, and it also lets servers perform other tasks when they're not busy serving Web content, Nicholos says. The tool also lets the university keep older servers in operation, saving the university money, according to Nicholos. About a third of the university's servers are older machines that are fully amortized. By keeping them in operation, "we've been able to really stretch out our dollars, really the taxpayers' dollars, in how we use our equipment," Nicholos says. The university's Web infrastructure consists of about 40 servers, of which 13 are used for its 50 general Web sites, Nicholos says. The rest are dedicated to specific content or tasks. Currently, the servers operate at about 20% to 30% of their capacity. The university would start to worry about the servers' ability to meet peak loads if they started to approach 60% to 70% of capacity, Nicholos says. Another metric Nicholos watches closely is Web-page response time, as measured by the Keynote Perspective service. Nicholos would begin to be concerned if response times for the university's main page started to reach 5 to 7 seconds, he says. At 15 seconds, the Resonate service sends out alerts to site administrators, and Nicholos says he'd start to test the Web systems for problems. Eagle Eye On Performance
Keeping track of how servers are performing is crucial to making sure that Web pages are served up in a timely fashion. Bear Creek Corp., a catalog and online retailer best known for its Harry and David fruit baskets, starts to get uncomfortable when the CPU utilization load gets to around 60% to 70%. "That's when we start worrying and want to get more boxes in place, because that [percentage] gives us a little bit of breathing room," says John Wade, VP of Internet systems at the Medford, Ore., company.About 20% of Bear Creek's business comes from Internet sales, and its online business is doubling every year, Wade says. It also does 70% to 80% of its overall business in the last two months of the year, which requires the company to keep close tabs on Web performance and add capacity as needed.In addition to CPU utilization, other warnings signs include response times in excess of 2 seconds or 3 seconds for pages delivered to distant parts of the Internet or in excess of 1 second for pages to be delivered within Bear Creek's own network, Wade says. In the off-season, Bear Creek usually has two Web servers. But for two or three months a year, the company keeps in reserve five servers that it brings on, one by one, as it enters its busy season. It uses the Big-IP Controller load-balancing software from F5 Networks Inc. to ensure that the processing load is distributed evenly across all the servers it has operating at a given time.The company tests its Web-site performance daily to determine when it's time to add capacity to the site. To make sure it has extra capacity available at the lowest cost, the company rents the servers it keeps in reserve. For Bear Creek, renting the servers costs less than buying them, Wade says.While it would be ideal not to have so many servers in reserve, it's a cost the company is willing to bear. Renting the spare servers, he says, "is still a lot cheaper than turning customers away from our Web site."
About the Author
You May Also Like