| September 29, 1997 |
Inside Web App Functioning
The right hardware and best allocation help get the most out of your system
By
Amjad Umar
These applications are a special class of multitier distributed programs that synthesize many technologies: firewall
s, proxy servers, thin clients, fat servers, gateways, HTTP, IIOP, and applets developed in Sun Microsystems' Java programming language. Before deploying your Web applications, you must weigh many design considerations, such as performance and flexibility.
It's best to navigate these choices and trade-offs systematically. First look at some basic architecture and design issues affecting your Web applications. This article will provide some metrics for gauging the impact of various design choices on application performance.
Developers can control application architectures, including tiering and platform choices, Java applet performance, and resource-allocation issues. Others areas, such as existing infrastructure, will present performance challenges that the designer has little control over.
Web applications can be designed as two-, three-, or N-tiered. The main debate seems to be over two-tier vs. three-tier architectures, and the latter appear to be winning. Generally, more tiers lead to greate
r flexibility, user independence, and resource availability, but they can complicate manageability and security.
While three-tier models are a natural fit for corporate intranets, they lack the extensive online transaction processing (OLTP) capabilities of mainframe systems. Expect this limitation to be resolved soon with middleware that will integrate Web, client-server, and legacy systems.
Adding more tiers to an existing system can introduce several unnecessary points of failure and performance bottlenecks.
Design choices should hinge on the fundamental question: What are the key business drivers behind this development effort? If flexibility and user independence are the key requirements, then more tiers are better. Another important question is: What type of application is being developed? Data-centric applications for decision support are well- suited to two tiers, while function-centric applications for operational support do well on three tiers.
You can optimize performance between
application components through colocation. For example, tightly connected application components that frequently communicate with one another shouldn't be split between machines, because network operations are much slower than local operations. Cluster clients and servers in a fast LAN if they cannot be colocated, and reserve WAN interaction for applications in which performance isn't critical.
Finally, don't over-distribute application components: Sharing heavily distributed data among thousands of users becomes more difficult, and performance may suffer. This is especially true for WANs, unless fast packet-switching networks such as asynchronous transfer mode are used. ATM is preferable to frame relay because it has fewer latency delays.
The client and server protocols you choose will affect overall system performance. For a quick rundown of the pros and cons of the major protocols in use, see the charts in this
PDF file
.
Many organizations are opting to dev
elop Web applications in Java. Java applets and applications are not known for their high performance, mainly because they use a virtual machine that interprets Java bytecodes and runs them on the hardware. Java applets need special attention because their performance degrades with the size of the program. In addition, the VM goes into garbage-collection mode when it runs out of memory and suspends program execution.
Allocation Issues
For a Web-based application, you must decide where to allocate the user interfaces, application programs, and data. User interfaces can be allocated to any computer that has the client-interface software to connect to the data servers. In Web environments, Web browsers available on almost all hardware handle interfaces.
The application programs can be allocated similarly. A special consideration for allocating a program to a particular computer is the availability of specialized software, such as programming tools and other utilities. In Web environments, the application can be allocated to the Web server, Web client site, or a back-end computer.
Data allocation is probably the most complex factor. The decision can be based on several factors, such as amount of storage, read communications, update communications, local input/output at each computer, and response time. Unique allocation, such as assigning one data object to one computer,
yields a small amount of storage, high read communications traffic, small update communications, and small local I/O. Duplicate allocation to multiple machines yields a large amount of storage, small read communications traffic, high update communications, and high local I/O.
Here are some guidelines for distributing data and program logic:
Backup and recover are easier with centralized applications.
Management and support are more difficult in distributed applications due to the current lack of good tools.
Infrastructure Issues
It's important to consider the options for performance scalability. Many vendors provide upward scalability by replicating identical services, allowing applications to share common services. You should ask your vendor how these shared services are implemented. Are automatic load balancing and server-workload monitoring standard? Are servers automatically restarted upon failure, and can additional servers be started to accommodate peak loads?
From a performance standpoint, it's wise to minimize the number of middleware layers in your Web application. It's best to use high-level middleware to reduce application-development costs. Generally, the higher the layer, the less code to write. For example, if your application must update data at multiple sites, it's best to use a distributed data manager or distributed transaction manager for that. But this middleware will reside on another layer of middleware. The Encina transaction monitor, for instance, is built on top of DCE.
Unnecessary t
ranslation of protocols such as gateways can affect overall performance. Use as few protocol gateways as possible, because gateways typically become performance bottlenecks. For example, Web-to-legacy applications that need to use 3270 screen scraping are quite slow because the HTML and HTTP code has to be translated into 3270 data streams. You have very little choice in this situation, but you should know the performance penalty. Similarly, Open Database Connectivity does not perform as well as native database protocols from Oracle, Informix, Sybase, and others, since ODBC drivers add a layer of overhead.
It's also important to analyze the physical media, network configurations, and interconnectivity devices to understand network performance. Although some people think of discussion of network support issues as too low level for Web applications, it is important to follow these simple guidelines in your design:
Let's walk through a very simple paper-and-pencil procedure for estimating the response time of a Web (or any client-server) application that will give you some insights and help you make design trade-offs before investing in hardware and software monitors.
In the
simplest case, the total response time (RT) of a task is given by the sum of all processing and queuing delays. Without queuing, the response time is given this way: RT = S(1) + S(2) + ... + S(N). S(integer) is the time needed for completion of any particular service, and N is the total number of services needed.
Service here means any activity needed to complete the task (for example, transmission of the HTML pages and Java applets from Web servers to your browser across an Internet or intranet; processing time of the CGI script to produce the results; and processing of database queries at the back end). S(i) and N can be easily measured through prototyping. For instance, you can determine the average service time it takes a Web server, under normal load, to fetch an HTML document with a few simple experiments.
Although the best-case estimates are a good starting point, they ignore queuing's impact on response-time calculations. Queues are formed when the device providing the service may be bu
sy or locked by another activity. We need to introduce another parameter to handle queuing. The arrival rate of requests for any particular service is represented as A(i). For example, if 10 Web browsers send five queries per hour to a Web server, then A(i) equals 50 per hour for the Web server. In the following discussion, server indicates anything that provides a service. It may be a device such as a disk, a software module such as a Web server or SQL Server, or an application module such as a routine that calculates pricing. The following formula, known as Little's Formula, shows utilization (U(i)) of a server:
U(i) = A(i) x S(i)
A rule of thumb used in queuing calculations is that utilization U(i) should be kept below 0.7 to avoid queuing. This makes intuitive sense, because overly busy servers do not have time to pay attention to you. The theoretical foundation for this rule of thumb is the following well- known formula:
Queue length at server i = Q(i) = U(i) / 1 - U(i)
in which Q(i) sh
ows the number of customers in the system, including the one being served.
Thus Q(i) = 1 if U(i) = 0.5, Q(i) = 4 if U(i) = 0.8, Q(i) = 9 if U(i) = 0.9, and Q(i) reaches infinity if U(i) = 1.
The net effect of queuing is that the service time increases due to queuing. For example, if there are four people in the queue, it will take you roughly four times longer to get the service. In effect, the service time S(i) is replaced with S'(i):
Service time at server i after queuing= S'(i) = S(i) + S(i) x Q(i)
So far we have focused on queuing for a single server. In most practical situations, a queuing network is formed where output of one server becomes an input to another server. Work in queuing theory shows the following very useful results in most real-life situations:
The following procedure may be utilized to design and improve the performance of Web applications.
Case In Point
Let's assume that we've gone through our checklist of performance considerations and decided this application will use a thin-client-fat-server model, HTTP for front-end processing, Java applets to support future graphics and moving videos, server-side scripts in C++, a relational database allocated to the Web server site, and back-end data access via ODBC.
Let's also assume that we have learned from the overall design that the client application typically sends a 20-byte query such as an item name or number, to the Web server. The server invokes a script that searches the catalog for the search argument and th
en sends the catalog information typically as two screens-about 4,000 bytes-to the Web client. In our example, the Web server can be allocated to one of two computers.
Computer 1, a fast midrange computer, is connected to users through 56-Kbps WAN lines-a public Internet site. It can complete about 20 catalog retrievals per second on average-as estimated by experimenting with a few catalog retrievals.
Computer 2, a slower desktop, is connected to users through a 10-Mbps intranet. It can complete 10 catalog retrievals per second on average.
Which computer should the Web server and the catalog be allocated to? We can assume each byte occupies 10 bits on the network-8 bits data, 2 bits start/stop bits.
To evaluate simple computer performance, we need to add the times needed for three elements: transmit time for transaction input (S1), transmit time for transaction output (S2), and time per service (S3). See the chart at right for an analysis of computer 1 and computer 2.
In this response-t
ime analysis, computer 2 seems to be the most suitable for the task. We also notice that the transmission time of results appears to be the bottleneck at computer 1.
Before making any quick decisions, let's do some calculations of the impact of queuing.
To estimate this, let's introduce the workload information by assuming that at the most 120 Web clients will use this application simultaneously and each client will generate the catalog retrieval message about five times per minute. This gives us an arrival rate (A) at the Web server machine of 120 x 6 / 60: 12 per second.
To compute new service time at computer 1 and computer 2 without worrying about the network at present, you need to consider utilization, average queue length, average wait time, and total service time at both computers.
Although computer 2 was favored in the best-case analysis, adding queuing causes serious congestion. Therefore, computer 2 should not be used for storing the customer database, even though it is connecte
d via a fast network. In fact, connecting a fast network to a congested machine is a bad idea, since it increases the arrival rate, further increasing congestion.
To make computer 2 a candidate for data allocation, the U must be reduced to less than 0.5. We can do this by reducing A or S (or both). Assuming that the service time of computer 2 is fixed (i.e., 10 requests per second), then we can compute the desirable value of A from the following equation:
A x 0.1
<
0.5
We should keep the arrival rate at computer 2 at less than five requests per second. This could be achieved by reducing the number of clients that query the computer by creating a copy of the catalog on another computer. To reduce S, a faster machine must be substituted.
We plunge into the multimedia jungle by assuming that the catalog information displayed shows icons that you can click to download a picture or to play a video clip. If the user clicks on the picture icon, he or she downloads a picture. Based on experimen
ts with typical .GIF files for pictures, let us assume that a good quality color picture takes up about 1 million bits. Transfer of this picture would require about 18 seconds on the 56-Kbps line and 0.1 second on the Ethernet corporate intranet. We need to pay more attention to the network and leave the Web server alone, because fetching a picture from the catalog does not increase Web server load.
Now consider the network data rates needed to support moving video. It's possible that a video clip is copyrighted and that the server will allow you to play it but not download it, so you must run the clip over the network. For a smooth-moving video, you need about 30 screens per second. The picture of 1 million bits would generate about 30 million bits per second of network traffic if the video is played at the server but displayed at the client across a network. Many distributed multimedia applications generate between 10 million and 15 million bits per second of traffic for a single user. Arrival rate on
a network segment due to a multimedia application may be between 10 and 15 Mbps. With compression, this can be reduced to about 1 Mbps.
What type of LANs (Ethernet, token ring, Fast Ethernet), MANs (FDDI, SMDS), and WANs (X.25, ISDN, ATM, frame relay) will be needed to support the multimedia applications and databases? For example, many older but widely deployed LANs, such as Ethernet and token ring, provide 10 Mbps to 16 Mbps data rates, and many existing WANs operate at 56 Kbps. Pursuing the multimedia example, a 10-Mbps Ethernet LAN would need 0.1 second for one moving video for a single user with a compressed image of 1 million bits, such as S = 1/10.
Using the formula U = A x S, we can tell that an Ethernet LAN could handle about five multimedia users with compression if we intend to keep U at about 0.5. This is the main reason people do not run video clips over the network. However, high-speed LANs such as Fast Ethernet and FDDI support 100 Mbps, making moving video applications feasible.
WANs are also beginning to support comparable data rates through ATM and frame relay networks that operate over T3 (45-Mbps) lines.
Finally, make sure that potential performance bottlenecks in the network due to traffic generated by the new application have been considered. The bottlenecks can occur due to the congestion on interconnectivity devices-routers and gateways are well-known potential bottlenecks-or due to the overutilization of physical media. For example, Ethernet LANs start having problems with a utilization of 0.3, while token rings can operate quite well at higher utilization.
The simple rules of thumb about arrival rates, service times, and utilization are very useful in this analysis. For example, you can determine potential queuing in a router by estimating its utilization based on an arrival rate and service time of the router.
For accessing back-end catalogs, you need to consider the same type of networking issues as discussed above. Let us shift our attention to back-end pr
otocols and middleware. If possible, protocol conversion should be avoided, because protocol-conversion gateways can easily become bottlenecks. For example, if a database gateway can handle 10 requests per second, such as S = 0.1, then it is dangerous to push more than seven requests per second to avoid queuing, such as 0.1 x A = 0.7. Although we have chosen ODBC for this application because it lets us access data that may be in Oracle, Sybase, Informix, or other relational databases, it is better to avoid the ODBC-to-native-protocol conversion if possible. In addition, protocols can have an impact on arrival rates. For example, chatty protocols generate several acknowledgements and clog up the interconnectivity devices. In addition, large messages can affect the service times of devices. This is why remote SQL is dangerous to performance. RPCs are more efficient because they use fixed numbers of parameters that are typically smaller than SQL tables.
In addition to these simple calculations, you may want
to run benchmarks for detailed and accurate analysis. The WebStone Web server benchmark from Silicon Graphics can be used to generate HTTP traffic. WebStone generates various types of workloads such as user and media workload. It also lets you measure performance parameters such as average and maximum connect time, response time, data throughput, and files retrieved. Benchmark testing is a good final stress test before moving your application into the production environment.
Design choices can be subdivided into two broad areas: application architecture issues that are generally within your control, and infrastructure issues that are not always within your control. Start with a checklist to make sure that performance issues are included early in your development project. Then use a simple formula based on service times and arrival rates to isolate potential bottlenecks. This approach can be used to design performance into an application and ensure business success with your new system.
Amjad Um
ar is a senior scientist at Bell Communications Research and an adjunct professor at Rutgers University. He can be reached at
aumar@notes.cc.bellcore.com
.
|
This Week's Issue
Technology Whitepapers
- Mobile BI: Actionable Intelligence for the Agile Enterprise
- Creating the Enterprise-Class Tablet Environment - by Yankee Group
- How To Regain IT Control In An Increasingly Mobile World - by BlackBerry
- Red Alert: Why Tablet Security Matters - by BlackBerry
- New Visual and Wizard-Driven Paradigms for Exploring Data and Developing Analytic Workflows
he importance of World Wide Web application performance is growing as companies move from Internet pilot projects to critical systems. By combining object-oriented technologies, client-server tools, and Internet communications, many organizations are creating distributed applications that offer new levels of business value.











