InformationWeek: The Business Value of Technology

InformationWeek: The Business Value of Technology
InformationWeek - Our New iPad App
News In Review

September 29, 1997

Inside Web App Functioning

The right hardware and best allocation help get the most out of your system

By Amjad Umar

illustration by John Bleck T he importance of World Wide Web application performance is growing as companies move from Internet pilot projects to critical systems. By combining object-oriented technologies, client-server tools, and Internet communications, many organizations are creating distributed applications that offer new levels of business value.

These applications are a special class of multitier distributed programs that synthesize many technologies: firewall s, proxy servers, thin clients, fat servers, gateways, HTTP, IIOP, and applets developed in Sun Microsystems' Java programming language. Before deploying your Web applications, you must weigh many design considerations, such as performance and flexibility.

It's best to navigate these choices and trade-offs systematically. First look at some basic architecture and design issues affecting your Web applications. This article will provide some metrics for gauging the impact of various design choices on application performance.

Developers can control application architectures, including tiering and platform choices, Java applet performance, and resource-allocation issues. Others areas, such as existing infrastructure, will present performance challenges that the designer has little control over.

Web applications can be designed as two-, three-, or N-tiered. The main debate seems to be over two-tier vs. three-tier architectures, and the latter appear to be winning. Generally, more tiers lead to greate r flexibility, user independence, and resource availability, but they can complicate manageability and security.

While three-tier models are a natural fit for corporate intranets, they lack the extensive online transaction processing (OLTP) capabilities of mainframe systems. Expect this limitation to be resolved soon with middleware that will integrate Web, client-server, and legacy systems.

Adding more tiers to an existing system can introduce several unnecessary points of failure and performance bottlenecks.

Design choices should hinge on the fundamental question: What are the key business drivers behind this development effort? If flexibility and user independence are the key requirements, then more tiers are better. Another important question is: What type of application is being developed? Data-centric applications for decision support are well- suited to two tiers, while function-centric applications for operational support do well on three tiers.

You can optimize performance between application components through colocation. For example, tightly connected application components that frequently communicate with one another shouldn't be split between machines, because network operations are much slower than local operations. Cluster clients and servers in a fast LAN if they cannot be colocated, and reserve WAN interaction for applications in which performance isn't critical.

Finally, don't over-distribute application components: Sharing heavily distributed data among thousands of users becomes more difficult, and performance may suffer. This is especially true for WANs, unless fast packet-switching networks such as asynchronous transfer mode are used. ATM is preferable to frame relay because it has fewer latency delays.

The client and server protocols you choose will affect overall system performance. For a quick rundown of the pros and cons of the major protocols in use, see the charts in this PDF file .

Many organizations are opting to dev elop Web applications in Java. Java applets and applications are not known for their high performance, mainly because they use a virtual machine that interprets Java bytecodes and runs them on the hardware. Java applets need special attention because their performance degrades with the size of the program. In addition, the VM goes into garbage-collection mode when it runs out of memory and suspends program execution.

Allocation Issues
Allocation, or placement of application components, profoundly affects the performance of distributed applications. A well-designed application can easily self-destruct if its components are assigned to slow or congested computers that are interconnected through slow communications links. Exact placement of data and application programs can be changed at run time, but it's best to minimize surprises. Most existing Corba object request brokers (ORBs) and distributed computing environment (DCE) programs hide the physical location of servers by providing directories that are accessed at run time to detect server location.

For a Web-based application, you must decide where to allocate the user interfaces, application programs, and data. User interfaces can be allocated to any computer that has the client-interface software to connect to the data servers. In Web environments, Web browsers available on almost all hardware handle interfaces.

The application programs can be allocated similarly. A special consideration for allocating a program to a particular computer is the availability of specialized software, such as programming tools and other utilities. In Web environments, the application can be allocated to the Web server, Web client site, or a back-end computer.

Data allocation is probably the most complex factor. The decision can be based on several factors, such as amount of storage, read communications, update communications, local input/output at each computer, and response time. Unique allocation, such as assigning one data object to one computer, yields a small amount of storage, high read communications traffic, small update communications, and small local I/O. Duplicate allocation to multiple machines yields a large amount of storage, small read communications traffic, high update communications, and high local I/O.

Here are some guidelines for distributing data and program logic:

  • Flexibility and growth are better achieved by distributing data as well as programs.
  • Downsizing initiatives that require moving applications away from the mainframe are best served by distributing data and programs.
  • Availability needs are better satisfied by allowing data duplication.
  • Development costs generally decrease by distribution because end-user computing productivity rises.
  • Performance may or may not improve with distributed applications. These configurations reduce the central site congestion but increase network communications.
  • Data sharing among thousands of users is more difficult when data is de centralized.
  • Large databases are generally better handled at centralized mainframes.
  • Control and security needs are better satisfied by centralized systems because distributed applications introduce security risks.
  • More IS training is needed to operate and manage widely distributed applications.

Maintenance and operation costs may increase due to multiple licenses and the need for expensive middleware.

Backup and recover are easier with centralized applications.

Management and support are more difficult in distributed applications due to the current lack of good tools.

Infrastructure Issues
In many cases, Web application designers have only partial control over infrastructure choices, since several other applications and workers use the same resources-as do networks, servers, databases, and middleware. Some key middleware considerations are scalability, number of layers, and translation between protocols. Each choice will affect overall application performance.

It's important to consider the options for performance scalability. Many vendors provide upward scalability by replicating identical services, allowing applications to share common services. You should ask your vendor how these shared services are implemented. Are automatic load balancing and server-workload monitoring standard? Are servers automatically restarted upon failure, and can additional servers be started to accommodate peak loads?

From a performance standpoint, it's wise to minimize the number of middleware layers in your Web application. It's best to use high-level middleware to reduce application-development costs. Generally, the higher the layer, the less code to write. For example, if your application must update data at multiple sites, it's best to use a distributed data manager or distributed transaction manager for that. But this middleware will reside on another layer of middleware. The Encina transaction monitor, for instance, is built on top of DCE.

Unnecessary t ranslation of protocols such as gateways can affect overall performance. Use as few protocol gateways as possible, because gateways typically become performance bottlenecks. For example, Web-to-legacy applications that need to use 3270 screen scraping are quite slow because the HTML and HTTP code has to be translated into 3270 data streams. You have very little choice in this situation, but you should know the performance penalty. Similarly, Open Database Connectivity does not perform as well as native database protocols from Oracle, Informix, Sybase, and others, since ODBC drivers add a layer of overhead.

It's also important to analyze the physical media, network configurations, and interconnectivity devices to understand network performance. Although some people think of discussion of network support issues as too low level for Web applications, it is important to follow these simple guidelines in your design:

  • Do your best to minimize network traffic by clustering closely interacting clien ts and servers within LANs.
  • Use networks with performance enhancements as much as possible. The latest network operating systems include performance enhancements such as caching to minimize network traffic.
  • Closely review, estimate, and monitor the network traffic due to ad hoc structured query language over the network. I have seen users join remotely located tables with 5 million rows in each. Due to this "SQL threat," some organizations have restricted the use of such queries to LANs only.
  • Overinvest in network bandwidth, because network congestion, especially due to ad hoc queries, can be unpredictable.
  • Invest in experience. After careful analysis, experimentation, and prototyping, run a pilot instead of infinite paper-and-pencil analysis. After implementation, perform measurements and tuning.
  • Monitor, monitor, monitor. It's important to keep monitoring the performance to understand response-time patterns and bottlenecks under varying conditions.

Many local platform issues are well-covered territory, so there is no need to belabor the obvious. Nevertheless, here is a quick checklist.

  • Web-server site: Check CPU, disk transfer rate, and database performance, if the database is used at the Web server site;
  • Web-browser site: Monitor CPU and disk transfer rate;
  • Back-end server where the data or application logic may reside: Look at CPU, disk transfer rate, and database performance.

Applying Theory
Theories have a bad reputation because they are too complicated for normal working folks. But in some cases, simple analysis based on sound theoretical foundations can get you further-and faster-than throwing hardware at the problem.

Let's walk through a very simple paper-and-pencil procedure for estimating the response time of a Web (or any client-server) application that will give you some insights and help you make design trade-offs before investing in hardware and software monitors.

In the simplest case, the total response time (RT) of a task is given by the sum of all processing and queuing delays. Without queuing, the response time is given this way: RT = S(1) + S(2) + ... + S(N). S(integer) is the time needed for completion of any particular service, and N is the total number of services needed.

Service here means any activity needed to complete the task (for example, transmission of the HTML pages and Java applets from Web servers to your browser across an Internet or intranet; processing time of the CGI script to produce the results; and processing of database queries at the back end). S(i) and N can be easily measured through prototyping. For instance, you can determine the average service time it takes a Web server, under normal load, to fetch an HTML document with a few simple experiments.

Although the best-case estimates are a good starting point, they ignore queuing's impact on response-time calculations. Queues are formed when the device providing the service may be bu sy or locked by another activity. We need to introduce another parameter to handle queuing. The arrival rate of requests for any particular service is represented as A(i). For example, if 10 Web browsers send five queries per hour to a Web server, then A(i) equals 50 per hour for the Web server. In the following discussion, server indicates anything that provides a service. It may be a device such as a disk, a software module such as a Web server or SQL Server, or an application module such as a routine that calculates pricing. The following formula, known as Little's Formula, shows utilization (U(i)) of a server:

U(i) = A(i) x S(i)

A rule of thumb used in queuing calculations is that utilization U(i) should be kept below 0.7 to avoid queuing. This makes intuitive sense, because overly busy servers do not have time to pay attention to you. The theoretical foundation for this rule of thumb is the following well- known formula:

Queue length at server i = Q(i) = U(i) / 1 - U(i)

in which Q(i) sh ows the number of customers in the system, including the one being served.

Thus Q(i) = 1 if U(i) = 0.5, Q(i) = 4 if U(i) = 0.8, Q(i) = 9 if U(i) = 0.9, and Q(i) reaches infinity if U(i) = 1.

The net effect of queuing is that the service time increases due to queuing. For example, if there are four people in the queue, it will take you roughly four times longer to get the service. In effect, the service time S(i) is replaced with S'(i):

Service time at server i after queuing= S'(i) = S(i) + S(i) x Q(i)

So far we have focused on queuing for a single server. In most practical situations, a queuing network is formed where output of one server becomes an input to another server. Work in queuing theory shows the following very useful results in most real-life situations:

  • Each server can be treated independently.
  • Arrival rate at a server is the sum of all arrivals from all sources.

Thus, if 10 browsers are issuing one HTTP request per second, then the arrival rate (A) at the Web server is 10 per second. If the Web server takes about 0.05 seconds per service, then the U of the Web server is 0.5. Therefore, the Web server will not have serious performance problems. However, if 10 more browsers start using this Web server with roughly the same traffic pattern, then the utilization of the Web server will reach 1, leading to infinite queues, the situation in which you begin to experience system delays.

The following procedure may be utilized to design and improve the performance of Web applications.

  • Overall design: Include performance by making design decisions about the application architecture and infrastructure discussed earlier.
  • Best-case analysis: Perform best-case analysis by ignoring queuing for general understanding. In this case, only S (service time) and N (the total number of services needed) are necessary for computations. These calculations can be used as a starting point. Determine whether performance requirements are satisfied. If n ot, there is no need for queuing analysis. If a system does not satisfy the best-case calculations, it will not satisfy the requirements with queuing.
  • Queuing impact: Study the effect of queuing and workload by estimating arrival rates (A). The estimate may be made at peak time or average time. Estimate total time, including queuing, by using the equations given above.
  • Congestion analysis: Try to reduce U to less than 0.7 for the services you can control. In addition, determine the bottleneck device-the one with the largest U. Try to reduce U by decreasing A and S. For example, if U of a database server is too high, then the following steps can be taken:
  • Reduce A by adding another server that may contain the entire database or most-queried data items.
  • Reduce S by getting a faster machine or by eliminating other work being done on the server.

Work through various scenarios and contingencies. If detailed analysis of a configuration is needed, then you may need to use monitoring tools or even simulate.

Case In Point
Let's consider a Web-based catalog-retrieval system being designed for a small organization. This application will initially support retrieval and display of text information. In the future, the catalog system may support pictures, moving videos, and access to several "back-end" supplier sites for detailed product information.

Let's assume that we've gone through our checklist of performance considerations and decided this application will use a thin-client-fat-server model, HTTP for front-end processing, Java applets to support future graphics and moving videos, server-side scripts in C++, a relational database allocated to the Web server site, and back-end data access via ODBC.

Let's also assume that we have learned from the overall design that the client application typically sends a 20-byte query such as an item name or number, to the Web server. The server invokes a script that searches the catalog for the search argument and th en sends the catalog information typically as two screens-about 4,000 bytes-to the Web client. In our example, the Web server can be allocated to one of two computers.

Computer 1, a fast midrange computer, is connected to users through 56-Kbps WAN lines-a public Internet site. It can complete about 20 catalog retrievals per second on average-as estimated by experimenting with a few catalog retrievals.

Computer 2, a slower desktop, is connected to users through a 10-Mbps intranet. It can complete 10 catalog retrievals per second on average.

Which computer should the Web server and the catalog be allocated to? We can assume each byte occupies 10 bits on the network-8 bits data, 2 bits start/stop bits.

To evaluate simple computer performance, we need to add the times needed for three elements: transmit time for transaction input (S1), transmit time for transaction output (S2), and time per service (S3). See the chart at right for an analysis of computer 1 and computer 2.

In this response-t ime analysis, computer 2 seems to be the most suitable for the task. We also notice that the transmission time of results appears to be the bottleneck at computer 1.

Before making any quick decisions, let's do some calculations of the impact of queuing.

To estimate this, let's introduce the workload information by assuming that at the most 120 Web clients will use this application simultaneously and each client will generate the catalog retrieval message about five times per minute. This gives us an arrival rate (A) at the Web server machine of 120 x 6 / 60: 12 per second.

To compute new service time at computer 1 and computer 2 without worrying about the network at present, you need to consider utilization, average queue length, average wait time, and total service time at both computers.

Although computer 2 was favored in the best-case analysis, adding queuing causes serious congestion. Therefore, computer 2 should not be used for storing the customer database, even though it is connecte d via a fast network. In fact, connecting a fast network to a congested machine is a bad idea, since it increases the arrival rate, further increasing congestion.

To make computer 2 a candidate for data allocation, the U must be reduced to less than 0.5. We can do this by reducing A or S (or both). Assuming that the service time of computer 2 is fixed (i.e., 10 requests per second), then we can compute the desirable value of A from the following equation:

A x 0.1 < 0.5

We should keep the arrival rate at computer 2 at less than five requests per second. This could be achieved by reducing the number of clients that query the computer by creating a copy of the catalog on another computer. To reduce S, a faster machine must be substituted.

We plunge into the multimedia jungle by assuming that the catalog information displayed shows icons that you can click to download a picture or to play a video clip. If the user clicks on the picture icon, he or she downloads a picture. Based on experimen ts with typical .GIF files for pictures, let us assume that a good quality color picture takes up about 1 million bits. Transfer of this picture would require about 18 seconds on the 56-Kbps line and 0.1 second on the Ethernet corporate intranet. We need to pay more attention to the network and leave the Web server alone, because fetching a picture from the catalog does not increase Web server load.

Now consider the network data rates needed to support moving video. It's possible that a video clip is copyrighted and that the server will allow you to play it but not download it, so you must run the clip over the network. For a smooth-moving video, you need about 30 screens per second. The picture of 1 million bits would generate about 30 million bits per second of network traffic if the video is played at the server but displayed at the client across a network. Many distributed multimedia applications generate between 10 million and 15 million bits per second of traffic for a single user. Arrival rate on a network segment due to a multimedia application may be between 10 and 15 Mbps. With compression, this can be reduced to about 1 Mbps.

What type of LANs (Ethernet, token ring, Fast Ethernet), MANs (FDDI, SMDS), and WANs (X.25, ISDN, ATM, frame relay) will be needed to support the multimedia applications and databases? For example, many older but widely deployed LANs, such as Ethernet and token ring, provide 10 Mbps to 16 Mbps data rates, and many existing WANs operate at 56 Kbps. Pursuing the multimedia example, a 10-Mbps Ethernet LAN would need 0.1 second for one moving video for a single user with a compressed image of 1 million bits, such as S = 1/10.

Using the formula U = A x S, we can tell that an Ethernet LAN could handle about five multimedia users with compression if we intend to keep U at about 0.5. This is the main reason people do not run video clips over the network. However, high-speed LANs such as Fast Ethernet and FDDI support 100 Mbps, making moving video applications feasible.

WANs are also beginning to support comparable data rates through ATM and frame relay networks that operate over T3 (45-Mbps) lines.

Finally, make sure that potential performance bottlenecks in the network due to traffic generated by the new application have been considered. The bottlenecks can occur due to the congestion on interconnectivity devices-routers and gateways are well-known potential bottlenecks-or due to the overutilization of physical media. For example, Ethernet LANs start having problems with a utilization of 0.3, while token rings can operate quite well at higher utilization.

The simple rules of thumb about arrival rates, service times, and utilization are very useful in this analysis. For example, you can determine potential queuing in a router by estimating its utilization based on an arrival rate and service time of the router.

For accessing back-end catalogs, you need to consider the same type of networking issues as discussed above. Let us shift our attention to back-end pr otocols and middleware. If possible, protocol conversion should be avoided, because protocol-conversion gateways can easily become bottlenecks. For example, if a database gateway can handle 10 requests per second, such as S = 0.1, then it is dangerous to push more than seven requests per second to avoid queuing, such as 0.1 x A = 0.7. Although we have chosen ODBC for this application because it lets us access data that may be in Oracle, Sybase, Informix, or other relational databases, it is better to avoid the ODBC-to-native-protocol conversion if possible. In addition, protocols can have an impact on arrival rates. For example, chatty protocols generate several acknowledgements and clog up the interconnectivity devices. In addition, large messages can affect the service times of devices. This is why remote SQL is dangerous to performance. RPCs are more efficient because they use fixed numbers of parameters that are typically smaller than SQL tables.

In addition to these simple calculations, you may want to run benchmarks for detailed and accurate analysis. The WebStone Web server benchmark from Silicon Graphics can be used to generate HTTP traffic. WebStone generates various types of workloads such as user and media workload. It also lets you measure performance parameters such as average and maximum connect time, response time, data throughput, and files retrieved. Benchmark testing is a good final stress test before moving your application into the production environment.

Design choices can be subdivided into two broad areas: application architecture issues that are generally within your control, and infrastructure issues that are not always within your control. Start with a checklist to make sure that performance issues are included early in your development project. Then use a simple formula based on service times and arrival rates to isolate potential bottlenecks. This approach can be used to design performance into an application and ensure business success with your new system.

Amjad Um ar is a senior scientist at Bell Communications Research and an adjunct professor at Rutgers University. He can be reached at aumar@notes.cc.bellcore.com .


Back to News in Review

Send Us Your Feedback

Top of the Page


Get InformationWeek Daily

Don't miss each day's hottest technology news, sent directly to your inbox, including occasional breaking news alerts.

Sign up for the InformationWeek Daily email newsletter

*Required field

Privacy Statement



This Week's Issue

Technology Whitepapers

Featured Reports







Video