With the right tools, processes, and planning, SLAs can keep everyone -- even IT -- happy.
In the late 1990s, functional service-level agreements were the holy grail for telecommunication service providers -- highly sought-after, but always just out of reach, as customers demanded high levels of service from their circuits and application services. Although most providers offered SLAs, few could were able to monitor them correctly or in real time.
Telecoms have strived to master SLAs, but that's only part of the story. An even more interesting and often unnoticed transformation is occurring as enterprise, government, and academic environments are also moving to SLA-driven services. Industries far removed from telecom now recognize that defining an end-to-end, customer-focused IT department is crucial to providing consistent, reliable, measurable, and usage-based service.
However, as was the case with telecom in the '90s, IT managers in these other sectors seldom have the right elements in place to generate and manage a successful SLA. The data that comprises an effective SLA -- service catalogs, defined processes, and holistic management and monitoring systems -- is still only a dream for many IT organizations.
Put another way, SLA-driven IT shops might be the way of the future, but many IT managers don't know where to start in the definition and management of SLAs. We will try to unravel some of that mystery here.
Getting Started With SLAs
When you think of SLAs, you might also think of penalties, contract termination, "free" services, and other woes that all too often were part of doing business with Internet service providers. It's little wonder that some organizations, attempting to avoid negative connotations, prefer to use the terms SLE (service-level expectation) or SLG (service-level goal).
But whatever they're called, SLAs do one thing: They define a specific level of service that is provided to a customer. These agreements can also define cost, usage levels, or other helpful data points that will allow both sides of the business (provider and user) to be on the same page regarding the level of overall service offered and received. Some examples of SLAs might be 99% availability, 48-hour server provisioning after a request is made, notification of an outage within five minutes of the occurrence, or security patch deployments within 24 hours of their release.
The first step in getting started with SLAs is definition of the service. What exactly are you offering to your customers? Applications? Network capability? IT services? Whatever they are, you should store them in a service catalog that is accessible to your customers. This can be as simple as a Word document or HTML pages; however, software vendors like Digital Fuel, NewScale, and others now offer service catalogs that enable customers to order IT services like they'd order a book from Amazon.com.
After you define your services, you need to define the expectations of the user community in the form of service-level requirements. Without these, organizations can never measure and manage the user experience in a meaningful way.
Service-level requirements are a balance of customer desires and operational reality. Your customer may request 100% availability, but this might not be an operational reality. Before offering metrics, you need to take a hard look at several aspects of your organization. What tools do you have on hand to monitor the environment? I saw one organization that had committed to more than 300 SLAs, but only had tools to monitor about half of them. Reporting the rest of the SLAs required an army of people to collect and report on data every month.
Many SLAs will involve traditional fault and performance tools that need to provide end-to-end management capabilities of applications, or calculate availability of services. Software vendors such as BMC, CA, EMC, Hewlett-Packard, and IBM all have good monitoring and management solutions for larger organizations. Other vendors, including Ipswitch WhatsUp, Kace, Nimsoft, ScienceLogic, SL.com, and SolarWinds also provide enterprise monitoring and SLA reporting for IT environments of all sizes.
Calculating SLAs can be resource-intensive. Using a best-practices framework like ITIL, Six Sigma, eTOM, or other IT-centric process methodology will help maximize the efficiency here. Be sure to calculate the time it takes people to execute components of the SLA and include that in your overall time calculation. Automate your processes as much as possible; this can dramatically save time. Runbook automation products like BMC's RBA, HP's Opsware/PAS, NetIQ's Aegis, Opalis, and others can automate some components of the operations environment.
It's also critical to define, prioritize, and track the progress of each aspect of the SLA, and to monitor SLA operational level requirements (OLRs) for organizations such as suppliers (network, hardware, or application vendors). Different service providers may be involved in different parts of the agreement, so it's essential to ensure that they understand and are accountable for their impact on the end-user experience.
During this process, you should focus on prioritizing OLRs and limiting their scope to key success factors in service delivery. Defining too many OLRs makes management of the environment and SLA overwhelming and ultimately unproductive.
After you've defined expectations, you need to assess your ability to realistically meet those expectations. In many service provider environments -- to the dismay of IT -- sales or marketing will agree to an SLA to close a deal, then inform IT about the SLA after the fact. This situation rarely results in a happy customer. So, before committing to an SLA, assess your operational readiness and identify areas of improvement to ensure that the SLA strategy can be implemented and supported, both tactically and strategically.
Seven SLA Steps
Develop realistic agreements
Make sure your customers know what they're agreeing to
Map all of the data elements required to launch the service-level agreement
Deploy tools that can monitor SLA compliance
Put in thresholds to alert IT to issues before they impact an SLA
Develop automated SLA reports
Monitor the SLAs and seek ways to adjust and improve measurements
With realistic SLA services and expectations defined, the last step is reporting. Although many of the tools discussed here have integrated reporting engines, you may also look to products like Integrein, Managed Objects, Oblicore, or OpTier that focus on end-to-end reporting via integration of existing data sources. For complex environments, these tools focus on reporting, not on the underlying data collection. (Most vendors offer separate products for data collection.)
Regardless of the reporting tools used, customers should have full transparency into the metrics --ideally, in real time. Organizations that manage and monitor IT operations may cringe at this idea, but customers want to see what is going on and might not want to wait for aggregated monthly reports to be e-mailed to them.
Online executive dashboards, when implemented in the context of SLAs, provide management with focused, actionable views of real-time service assurance information. Understanding how to interpret the SLAs will also drive effective system implementation by feeding the development and training activities related to technology, workflows, and CRM that are required for successful implementation.
2014 Next-Gen WAN SurveyWhile 68% say demand for WAN bandwidth will increase, just 15% are in the process of bringing new services or more capacity online now. For 26%, cost is the problem. Enter vendors from Aryaka to Cisco to Pertino, all looking to use cloud to transform how IT delivers wide-area connectivity.
The UC Infrastructure TrapWorries about subpar networks tanking unified communications programs could be valid: Thirty-one percent of respondents have rolled capabilities out to less than 10% of users vs. 21% delivering UC to 76% or more. Is low uptake a result of strained infrastructures delivering poor performance?
. We've got a management crisis right now, and we've also got an engagement crisis. Could the two be linked? Tune in for the next installment of IT Life Radio, Wednesday May 20th at 3PM ET to find out.