For IT teams that like control, the "black box" cloud model, where customers implicitly give up the right to direct how and when most tasks are performed, can be stressful. The remedy: robust service-level agreements tailored to the as-a-service paradigm. The problem is that not all service providers agree. To the extent that cloud computing providers offer SLAs at all, in our experience their agreements tend to be weak, laden with unfavorable credit terms, and overly standardized with scant room for negotiation.
The trick for enterprise IT teams is to get the protection afforded by a tailored SLA without negating the benefits--lower cost, increased scalability, and simpler management--that led them to the cloud model in the first place. In fact, an SLA itself isn't enough; just as important is a service specifications document, which spells out the respective responsibilities of the provider and customer; your ability to remove data from the cloud and mechanisms for doing so; and requirements relating to security, compliance, and data retention. (We discuss the service spec document in more detail in a free report.)
As for customized SLAs, we've heard plenty of providers argue that they can't deviate from a standard agreement because their multitenant offering requires a standard product description and contract. While this reasoning has some merit with respect to service capabilities, it's less clear why it should be the case when requesting an SLA that measures an existing aspect of the service. For example, while most software-as-a-service providers don't guarantee that transactions will complete in a certain amount of time in their standard SLAs, transaction execution is an implicit element of the provider's service, so it should be possible to negotiate a service level.
Providers' inflexibility may be due simply to their not having been pressured to change--yet.
If you want a custom SLA, we recommend submitting a request for proposal with your requirements. Depending on your leverage, however, the provider may mark up the proposed SLA extensively or refuse it outright and require you to negotiate from its baseline. Worst case, it could simply insist you accept its canned set of terms.
What then? That depends on the uniqueness of the cloud offering and whether the business side has latched on to a desired provider. How you negotiate SLAs should be tailored to reflect the strength (or weakness) of your company's position, anticipated provider behavior, and the quality of the provider's standard SLA. If you must move ahead with a cloud service and you have little negotiating leverage, use your limited influence to address only the most egregious aspects of the provider's SLAs, rather than asking for something overly aggressive that will likely result in outright refusal.
Our four-step plan for getting the best cloud SLA will help.
Step 1: Build The Service-Level Portfolio
The first step in developing an SLA is identifying the portfolio of service levels that best measure and manage provider performance. In determining these metrics, a company should:
>> Make the metrics relevant to business performance, not technology. Service levels should focus on business outcomes, rather than provider compliance with technical parameters that don't relate directly to business value. For example, if an application is used predominantly in support of monthly closing activities, measuring availability over the entire month doesn't reflect how critical the software is at the month-end period.
>> Develop a collectively exhaustive metric portfolio. Ensure that any failure to meet the business' needs will be reflected in a failure to meet one or more service levels. For example, if a substantial increase in screen update time would sink user productivity, include a related metric in the portfolio. A problem we frequently encounter in "distressed" service relationships is dissatisfaction with service quality even though SLAs are consistently being met.
>> Be sure service levels are mutually exclusive. Avoid overlapping service-level metrics that can dilute provider focus and result in misleading performance reports. For instance, an infrastructure-as-a-service customer might consolidate metrics for "average time to provision a server instance" and "provision success rate." By avoiding duplication and overlap, you also eliminate the need for the provider to set pricing to protect itself against "double jeopardy" credit situations.
>> Institute checks and balances between metrics. Your SLA portfolio should consider each service level in the context of the overall SLA framework and the outcome you want. Address any potential adverse incentives with a counterbalancing metric. A common example comes from outsourced service desks, where emphasis on "average handle time" can lead to lower-quality call outcomes unless a compensating metric, such as "first-call resolution," is included.
>> Limit the size of the portfolio for manageability. An SLA portfolio of more than eight to 10 service levels can become unwieldy. You want to keep the provider's attention on the metrics most critical to your business, which in most cases relate to service availability, service performance, and timeliness of fixing problems.
Step 2: Construct Individual Service-Level Metrics
This phase emphasizes exhaustiveness, only now applied to ensuring that each service level covers the full scope of the provider's responsibility. For example, as the provider's service includes its network connection and infrastructure, parameters such as service availability and response time should include these components, rather than using measurements from a monitoring server on the same LAN segment. Another important objective when writing the SLA is to avoid future disagreements by making the service levels as quantitative and unambiguous as possible.
At a minimum, every SLA should include the following:
>> Detailed description of the service level. This includes points of demarcation, triggers to initiate and terminate measurement, and criteria for success and failure. Define terms that otherwise might be open to interpretation: For example, is a degraded service still considered available? When is a problem classified as high priority?
>> Explanation of the data-collection process. Minimize ambiguity by describing data sources and data fields, collection times and frequency, and responsibility for data-collection activities. Decide if you'll use data collected by the provider, establish an internal monitoring capability, or use third-party cloud monitoring services, such as Cloudkick, Monitis, or Gomez.
>> Outline of the performance calculation. This can have a marked effect on service-level effectiveness. Consider how different calculation approaches will drive incentives and behavior. If resolution effectiveness is measured as mean time to repair, a provider with a large number of quick fixes can get away with having a single, extraordinarily long issue. Conversely, requiring that 95% of incidents be resolved within four hours provides an incentive for the provider to deprioritize resolution activity as soon as any incident enters the fifth hour. One answer is to include both "mean" and "maximum" resolution as distinct metrics within the SLA. Alternatively, you can develop compound service levels--for example, 95% of incidents resolved in four hours, 100% of incidents resolved in one business day.
Step 3: Set Realistic Performance Targets
Establishing the required level of service performance is another challenge. Set the threshold too low, and service will not meet your expectations; set it too high (as most IT teams tend to do) and you'll likely incur additional costs or miss opportunities to obtain concessions, such as tighter SLA exclusions, reduced credit caps, and other contractual terms. For cloud services, performance negotiations are further complicated by the provider's limited ability to offer differentiated delivery and support models. This means customers generally "get what they get," and any incremental tightening of service levels is reflected in increased service costs to offset anticipated losses from occasional SLA failure.
You have two main options to determine the performance needs of the business. If your company has historic data regarding its own performance, you may use that as a baseline for requested performance, adjusted based on business opinions of the performance and current requirements. Alternatively, you can use your stated performance measurement and calculation techniques and figure out the point at which a performance drop-off starts hurting the business. If neither is possible, you may need to research performance commitments available in the market for similar services through vendor information, account reps, and colleagues or user forums.
Be wary if providers request that performance exceeding one or more targets be used to offset shortfalls elsewhere. This might seem fair on first review, but it can distort the SLA model. If any service levels are easy to meet consistently (as is frequently the case), the provider effectively gains a free pass for service-level violations. What's more, there's often little benefit to the cloud customer for the provider to exceed stated performance targets. If the business doesn't need a service instance to be provisioned in less than one minute, why encourage the provider to speed up that service?
Step 4: Define Remedies For Failure
SLA credits are always one of the most heavily negotiated areas in an agreement. The credit structure is usually developed in two stages. First is settling on the total "fees at risk"--that is, the maximum compensation in service-level credits that the provider would be required to provide. Next is the less contentious allocation of those fees to individual service levels.
Conventional SLAs generally end up with around 10% to 15% of fees at risk, with a 200% to 250% multiplier. The effect of the multiplier is to increase the sensitivity of the agreement to individual performance failures, while capping the credit amounts payable for performance failure. A 15% cap with a 200% multiplier allows the customer to allocate the equivalent of 30% of the fees to the individual service levels. A variation of this approach is a system of points where (in this example) 200 points would be allocated across the individual service levels, with credits calibrated such that a total of 100 points would entitle the customer to a 15% credit payment.
In general, comparing more general outsourcing SLAs to cloud agreements, expect to settle for considerably less for cloud agreements, given the volume-based delivery model and thin margins. Instead, focus on non-monetary resolutions to SLA failures. Avoid limitations on further remedies, such as consequential damages or the option to terminate the agreement without liability. Retaining the right to pursue additional damages--in the case of gross negligence on the part of the provider, for example--is one reason you don't want to sign any service agreement that states, "Performance credits are the sole and exclusive remedy for performance failures."
Jonathan Shaw is a principal at Pace Harmon, an outsourcing advisory firm. Write to us at [email protected]