January 29, 2001
http://www.informationweek.com//822/apps.htm
The Well-Mannered Application
The distributed nature of E-commerce applications requires better and more precise measurements for performance and reliability. The advent of ASP-provided applications adds yet another wrinkle to the application performance challenge.
By Steve Steinke, Network Magazine
t's hard enough to get networked applications deployed and running. But increasingly, the performance of crucial applications needs to be monitored and controlled.
Two trends are propelling and complicating the need for performance management. First, the near ubiquity of the Internet Protocol and the declining cost of network throughput make application outsourcing more feasible than ever. Second, many critical apps are distributed across multiple architectural tiers and independently controlled networks.
The implications of these trends warrant examination. Companies may choose to turn over components of their applications to application service providers, management service providers, or some other sort of service company.
However, service providers will rarely be able to justify taking over an application crucial to their enterprise customers without negotiating a service-level agreement. A traditional SLA, such as one used for frame relay or Internet connectivity that specifies aggregate measurements like uptime averaged over a month, won't do. The service-level agreement for a specific critical application must be detailed and granular.
Certainly, applications need to be available. But the sort of availability that, say, an Internet service provider might guarantee--for instance, 99.7% uptime for the month--tells you very little about the customer experience. Those 2.16 hours (0.3% of 30 days) of downtime may be at peak demand times, or they may coincide with users' most important deadlines. In many circumstances, average monthly uptime is inconsequential to users.
One of the most commonly desired requirements for applications is a short and predictable response time. A few ISPs are willing to include latency in their SLAs, though they generally restrict the circumstances to such an extent that the guarantee has little value. Most commonly, only traffic that remains on the ISP's network is guaranteed. In addition, it's a common practice to average latency measures over periods as long as a month, which, like the gross availability measure, may correlate imperfectly with real-time user experiences. Useful response-time measures should be taken as they occur and should be reported immediately, as well as averaged over longer periods.
Application users have a number of other service-level measures that provide a closer tie to their actual experiences than gross uptime or delay. For example, the percentage of transactions that run to completion may be more important than minor delays. It may be important to know percentages of customers that fill out complete forms and then abandon the transaction. The measurements that matter to a user in a real-time videoconference won't be geared so much to discrete transactions as to the details of factors such as latency and jitter. In many cases, some of the most interesting performance metrics are specific to a particular application. Not only are commonly guaranteed network performance specifications irrelevant to users, but generic application-layer performance specs may also fail to indicate acceptable performance.
While the increasing demand for detailed SLAs calls for client-centric management, the demand for end-to-end management requires insight into network elements and servers that may not be instrumented for performance measurement. New n-tier application architectures aren't necessarily rolled out with integrated management capabilities (see diagram, p. 69). Application servers that process the business logic of an application, database servers that feed data to the application server, load-balancing devices, firewalls, and virtual private network processors all may contribute to application degradation. Ideally, these devices should have granular performance-management capabilities.
The situation is further complicated for critical applications because many of these architectural components will be duplicated (or triplicated) for redundant operation.
Successful end-to-end performance management isn't the end of the struggle. Typically, transactions and other application events are only interesting as part of a business process. For example, taking an order from a customer is part inventory management, shipping, and accounts receivable. The aim isn't to manage a technical operation with servers, disk arrays, routers, and so forth, but to make business processes efficient and pleasant for customers, suppliers, partners, and employees.
Application performance management vendors have traditionally addressed three management techniques. One, with roots in the world of mainframe applications, is for programmers to insert markers or break points within apps. The performance statistics are written to a file or communicated to another program at each crucial step to be monitored.
The second technique involves sucking up all or selected parts of the traffic on the network, correlating requests with replies, and inferring application performance based on packet analysis.
The third technique installs a sort of parasitic agent alongside each client or, perhaps, on the same LAN segment as a group of clients. The agent looks at operating system-level events, such as opening windows and on-screen form submissions, and is either told or can infer what to measure.
The advantage of the programmed break point is that the original programmers know precisely what the right events for accurate and comprehensive instrumentation are, and there should be no doubt that the measurement is correct. Furthermore, the measurement can be as granular as needed.
Applications that aren't amenable to meaningful performance measurements based on generic measurements can only be measured and controlled with the custom-programmed management approach. The disadvantage is that few client-server or n-tier applications have been instrumented in this fashion. Even companies that are prepared to devote skilled programming resources to developing custom management solutions are out of luck if the application wasn't created with management hooks and APIs. The other primary disadvantage of a custom approach is that skilled programming resources are rare, costly, hard to control, and usually slower than expected.
The traffic inference approach has the big advantage of not requiring programmatic intervention in the target application. Probes or agents deployed on the local network close to the user may also be the simplest approach in terms of the number of computers and connections that must be modified.
The biggest disadvantage of traffic inference is that it's limited to only the least common denominators of application performance. Specific application behaviors may not be detected. Furthermore, performance bottlenecks in the client may not be detected by these tools. The link between the probe and the user is something of a black hole for this method of data collection.
Finally, the accuracy of a probe-only system may be questionable because the best the performance-management software can do is make assumptions about the beginning and end of transactions and the makeup of packet flows.
|
Selected
Application Performance- Testing Tools
|
|
| BMC
Software www.bmc.com/patrol |
Patrol
|
| Computer
Associates www.ca.com/products/tng_application_response.htm |
Application
Response Option |
| Compuware www.compuware.com/products/ecosystems |
EcoSystems
|
| Concord
Communications www.concord.com |
eHealth |
| Dirig
Software www.dirig.com |
RelyENT
xSpress
|
| Hewlett-Packard www.managementsoftware.hp.com |
OpenView
VantagePoint
|
| Lucent
Technologies www.lucent.com/networkcare |
VitalSuite
|
| Manage.com www.manage.com |
Frontline
e M
|
| NetIQ www.netiq.com/products/network_performance |
Pegasus
|
| NetScout
Systems www.netscout.com |
ngenius
|
| Tivoli www.tivoli.vom |
Application
Performance Management
|
| DATA: NETWORK MAGAZINE | |
The most significant disadvantage of client agents is the disadvantage of application-specific clients in general. It's hard to deploy software in mass quantities. Some combination of operating-system versions, hardware, and application software will likely have installation and compatibility problems with the agent.
Agent software upgrades are painful and costly. Operating system-specific agents require deploying multiple versions, while Java-based agents may demand hardware upgrades to keep from degrading performance unacceptably. In addition, general-purpose agents may not be able to provide the level of detail that custom-programmed instrumentation can.
The vendor that has moved the closest to a management system focused on granularly measuring and controlling end-to-end business transactions and processes is Manage.Com Inc. The principal components of the Manage.Com performance-management platform Frontline e.M are the Frontline e.M Server, a dedicated Web server that serves as the central logic and data repository; e.Agents, Java-based applets that can be installed on clients and servers throughout the management domain; e.Registry, a Web site that contains updated service logic and agent configuration data; and manageXML, an XML dialect that supports the transport of update and configuration information among the components of the Frontline e.M system.
In December, Manage.Com released a set of development enhancements called Frontline Java Management Edition. These tools include a Java-management authoring kit and an extensible Java Application Adapter, which is designed to interact directly with pure Java applications, Enterprise JavaBeans, and applications developed in C and C++ (via the Java Native Interface).
The Application Adapter uses techniques that Manage.Com calls Java Direct Connect to control and monitor transactions and processes with a minimum of custom coding. In many cases, there's no need to rewrite source code--it's sufficient to install new object classes. Because many Web-centric middleware applications, such as application servers, transaction monitors, application integration products, and content-management systems, are built on a Java platform, managing these key process components with Frontline e.M is greatly simplified.
Companies and service providers whose core applications are Java-based will be the largest target for the Java Management Edition, but the regular Frontline e.M platform satisfies most of the key requirements for managing Web-based business processes. It can instrument clients and servers, and collect Simple Network Management Protocol data from devices and probes, so managers have the option of drilling down anywhere along a transaction's round trip. It can provide real-time performance reports and alerts. The e.Registry site, combined with the e.Connect secure connectivity service, is a good solution to the problem of distributing software updates. And in the right circumstances, Java Direct Connect can provide the flexibility of writing and deploying custom code without the constraints of modifying existing code.
The major management platform vendors have all taken steps to facilitate customization of applications so performance is meaningfully measurable. Hewlett-Packard and Tivoli Systems Inc. have both contributed to the development of the Open Group's standard Application Response Measurement API.
HP's OpenView VantagePoint Performance Agent supports the ARM API, and the VantagePoint Performance Manager provides a graphical display of historical performance data, while the VantagePoint Performance Monitor provides real-time views of the performance of applications and their components. The HP OpenView Response Time Workbench is designed to customize application-specific client agents, while the MeasureWare Client Observer provides the basic monitoring framework for the agent.
Tivoli's Application Performance Management product offers three potential types of client instrumentation. Apps adapted for use with the ARM API can capture the time between programmed start points and end points. Client behavior such as new windows or changed URLs can also be related to transaction start points and end points.
Finally, a dummy client can be scripted to submit synthetic transactions that can be useful as yardsticks for traversing the network and complex application tiers. Information can be monitored and alerts can be processed by Tivoli management consoles.
Computer Associates provides an Application Response Option for both Unicenter TNG and NetworkIT. Support for commonly deployed software, such as Lotus Notes, SAP R/3, PeopleSoft, and Internet Explorer, is predefined. The system measures end-to-end response time, automatically creates baseline measurements, and can be extended to capture statistics for custom applications.
NetScout Systems Inc.'s roots are in the remote monitoring probe business. The company began moving down the performance-management path by measuring applications flows with its line of RMON2 probes.
NetScout created and published the Application Response Time management information base for tracking application performance data via RMON2. The company unveiled its nGenius line of performance-management products in May, but after it acquired the service-level management company NextPoint Networks in July, it was able to offer a user-agent-based nGenius Application Service Level Manager.
Concord Communications Inc. began with a monitoring and reporting product that collected data from existing SNMP and RMON agents. Despite success in the network-and traffic-monitoring arena, Concord saw the shape of coming demand for end-to-end application management and made two key acquisitions. In October 1999, it merged with Empire Technologies and gained access to performance-management software geared to server-based applications. Last January, Concord merged with FirstSense Software and gained its line of highly developed client agent technology.
The Concord/FirstSense clients are lightweight. In fact, they're designed to kill themselves if they begin to account for too much CPU capacity, typically usage in excess of 3% of the total. The agents' use of network capacity is also minimal, with only summary statistical information routinely passed upstream, commonly at 15-minute intervals. If communication between the agent and the controller is disrupted, the agent maintains all data as long as it has sufficient storage capacity. FirstSense worked out the client agent distribution problem with mail, Web, or push-type distribution options and automatic installation and registration capabilities.
Another vendor whose first foray into application performance management was based on network-connected probes is Compuware Corp. The suite of Compuware application-management products is called EcoSystems. The probe-based performance-measurement system is EcoScope, while the application-specific server management system is EcoTools. EcoScope functions much as the NetScout Application Flow Management technique does, using a probe to examine each packet from a central location on the network and assigning each one to a specific application flow based on source and destination addresses, port assignments, URLs, and other kinds of "signatures" within the packet. Using the time stamps on the packets in each flow, the probe infers the users' response time without a client agent.
Compuware also offers a number of interesting tools for application performance design and testing. Two programs inherited from its July acquisition of Optimal Networks, Application Expert and Application Vantage, are capable of detecting and displaying network-based application bottlenecks. Compuware also acquired CACI Products in December 1999 and has incorporated that company's traffic-simulation and capacity-planning products into its own line, calling them EcoProfiler, EcoPredictor, and Comnet III.
BMC Software Inc. has a lengthy history of solving application performance-management problems. BMC's Patrol line offers support for many variants of Unix, Windows 2000 and NT, NetWare, Oracle, Ingres, Informix, Sybase, Lotus Domino, Microsoft Exchange, SAP, Baan, and such middleware offerings as Tuxedo and IBM's MQ Series. Individual Patrol modules not only monitor the individual characteristics of server processes, they can also generate synthetic transactions to assess end-to-end performance.
Patrol for Service Level Management works with individual application modules to support the creation, deployment, and operation of SLAs. Companies that use Patrol components or have substantial application-performance challenges involving legacy systems, large databases, and data warehouses will be comforted by BMC's comprehensive offerings, professional services, and training capabilities.
Another vendor whose application performance roots lie in server management is Dirig Software Inc. Dirig calls its agent technology Proctor and sells two versions of it: RelyENT for enterprise customers and xSPress for service providers. The primary difference is that xSPress supports multiple independent views as well as consolidated views, so service providers can show performance measures to their customers without showing them to everyone. The Proctor agent runs as a daemon on the common Unix variants and as a service on Windows NT.
It can communicate via SNMP, HTTP, or File Transfer Protocol. Subagents located across the network report on system resources, monitor processes and log files, and can take corrective action. Specific Application Managers are available for Apache and Microsoft IIS Web servers; Oracle, Progress, and SQL Server databases; Microsoft Exchange; and Citrix MetaFrame and Microsoft Terminal Server Edition. The products come with a software developers' kit to let network managers configure full application-performance management for custom apps. The Dirig products aren't designed to be end-to-end performance-management solutions--Dirig has collaborated with Aprisma Management Technologies and HP OpenView, both of which resell these products.
The foundation of Lucent Technologies Inc.'s NetworkCare division's application performance-management approach is a client agent. Through a chain of acquisitions, Lucent inherited the VitalAgent client, originally a standalone product that made an ambitious attempt to assess the performance of each element of a distributed application from the client side. Lucent has developed supplements to VitalAgent, including VitalNet, which provides network and server performance information; VitalAnalysis, the console application that aggregates performance data, processes it, creates reports, and tracks SLAs; BTMS, which monitors business transaction performance; Automon, a synthetic transaction creation and measurement tool; and VitalHelp, a fault-detection and troubleshooting tool.
The sum of these components is VitalSuite. In October, Lucent introduced VitalSuite SP, which is geared to service providers. Not only does it support independent customer views (and reports) as well as consolidated views for the service provider, it has been retooled to scale to large user populations. If VitalSuite SP fulfills Lucent's claims, it could be a real solution for service providers that need to provide managed end-to-end service levels.
Pegasus from NetIQ Corp., which in May acquired Mission Critical Software, which had itself acquired Ganymede Software, the developer of Pegasus, is another client-centric end-to-end application-performance manager. Its end points can track actual transaction response times or generate scripted synthetic transactions. NetIQ also makes performance-measuring tools for developers and network designers.
Whether a company decides to run critical applications internally or farm them out to a service provider, the managed SLA is set to become the basis for making such decisions. Therefore, service providers and enterprise IT managers will soon be compelled to build the infrastructure that can indicate whether those agreements are being met. Some desirable provisions of application-based SLAs can't be readily measured with off-the-shelf products. In that case, the only possible path will be to deploy management tools that support custom development, including custom agent characteristics. It's also clear that there's no real substitute for instrumenting at least some clients directly, and both synthetic and actual transaction measures have important roles to play.
Furthermore, management tools that simply monitor business processes and promulgate alarms are less desirable than tools that can, for example, restart a hanging process or transfer the processing load to an alternate server. Most of the vendors discussed here have converged on these principles.
The industry hasn't reached the point where end-to-end business processes can consistently be deployed with enforceable SLAs using plug-and-play components. Unfortunately, only companies or service providers with deep pockets can afford the possibility of failed high-profile programming projects. It's hard to imagine widespread ASP success without a comprehensive measurement infrastructure, though.
The growth of E-commerce depends on the successful evolution of application performance-management tools. IW
BP seeking Regional Desktop Coordinator in Houston, TX
Agilent Technologies seeking Marketing Manager in Melbourne, AU
Advancement Project seeking Junior Web Developer in Los Angeles, CA
Johns Hopkins Univ Carey Business School seeking Asst Dean for IS in Baltimore, MD
City of Westland seeking MIS Director in Westland, MI
For more great jobs, career-related news, features and services, please visit our Career Center.