Where Agile Development Fails: IT Operations
Agile developers want fast and frequent deployments. IT operations teams want stability. A growing movement is trying to bridge the gap.
The agile approach to software development has been winning enterprise converts since a group of 17 programmers laid out the core principles in the Agile Manifesto in 2001. The incremental sprints to deliver working software and frequent interactions with business stakeholders that agile requires are widely cheered for keeping software projects focused on practical goals. But there's a growing movement to improve agile by making developers accountable for how well their software actually runs in full-blown production mode.
This improvement effort goes by many names: continuous integration, continuous delivery, deployment pipelining, or just plain "dev ops." It seeks to add the IT operations team as a more important stakeholder in the agile process. Instead of developers being congratulated solely for finishing code that meets the business users' requirements on time and on budget, they would also be held more responsible for how easily it deploys, how few bugs turn up in production, and how well it runs.
It's undoubtedly a tall order.
Continuous integration has long been a goal of deep thinkers who contemplate the software development process--partly because it's so seldom achieved. In a world where developers typically produce code then toss it over the wall to the operations team, dev ops proposes to bring the two teams together. However, in some ways, agile's great success has become a barrier to bringing development and ops together; the agile approach's need for frequent releases is at odds with operations' desire for stable IT systems.
By "operations" we mean the systems and database administrators and other IT infrastructure specialists who keep systems online and serving the business. They often have expertise in how particular applications run and when they're likely to get overloaded. They know how to configure a new application for deployment, stage it in a pre-production environment, then blend it into all the other operating systems without causing a hiccup in the data center. They are also the people who get called on the carpet when hiccups do occur.
Agile advocates consider it essential for agile teams to release code for business stakeholder review early and often. Agile teams also want to update production systems in short and frequent cycles. But operations teams instinctively oppose frequent cycles because that model threatens stability. They'd like update cycles once or twice a year, or at most once per quarter. They have a wealth of all-nighters in the data center and other bitter experience that tells them system failures often follow software updates.
"Developers have no experience in operations--it's a huge problem," says developer Jez Humble, co-author with David Farley of Continuous Delivery (Addison-Wesley, 2010). Humble is also a consultant with ThoughtWorks Studios, an agile development consulting firm.
There are other tensions between development and operations. Software often is developed in one environment, then run in another. Developers tend to develop in the environment for which their tools are made, which oftens means Windows. Testing done in the development environment often misses bugs that show up in production. Agile tends to focus on testing to confirm that the desired business functionality has been produced but can overlook difficult-to-test attributes, such as scalability, reliability, and ability to sustain peak load performance--attributes prized in operations.
Thus agile development, which has been so successful at connecting the once isolated developer to his business user, has tended to alienate the operations team. Dev ops is trying to extend the gains made with meeting business users' needs to operations' needs.
Five Critical Elements
Our report on a more agile approach to implementing business intelligence is free with registration.
This report includes examines BI as a classic waterfall project and how agile could work better, including:
Why trying to perfect requirements, data model, and reports along the way is impractical
The five critical factors for applying agile techniques to BI
Alternatives From The Web
Computer science courses seldom mention operations, and freshly minted developers naturally gravitate toward "the cool stuff they're going to build" as opposed to the more staid disciplines of keeping systems running, says Scott Ambler, chief methodologist for agile and lean development in IBM's Rational division and columnist for Dr. Dobb's (drdobbs.com). "It's really a blind spot," Ambler says. Even experienced enterprise IT developers may lack the broad range of experience to prepare them for sharing part of the IT operational responsibility.
Yet there might be hope sprouting in the very "cool" companies where young, ambitious programmers want to work: Web companies like Google, Facebook, and Amazon.com that don't tolerate gaps between development and operations. Amazon CTO Werner Vogels has famously said that developers should also be operators. Amazon's e-commerce operation is now organized around discrete application services that are called through an API. Developers of a service at Amazon bear the primary responsibility for its operation throughout its life cycle. "You build it, you own it," Vogels said in the May 2006 issue of the Association for Computing Machinery's Queue magazine.
Google and Facebook build new releases and put them in production on a weekly basis, says Ambler. These frequent releases increase risk of failure from an operations point of view, but they also reduce risk by keeping the number of changes included in an update small and manageable. That way, they know where to look if there's a problem.
Flickr goes a step further and issues multiple releases of production systems daily, says Humble. That led to Flickr experiencing four outages recently, but each lasted only about six minutes, he says--the amount of time that Flickr developers needed to isolate the most recent changes and identify what was wrong. Outages in a new release of a typical enterprise system are much harder to track down because of the volume of changes.
But large, uniform Web applications such as Facebook or Google are much different from the enterprise data center, with its complex mix of heterogenous applications. When an outage occurs in that environment, it usually results in the dreaded "bridge call," pulling every expert on the infrastructure together for a lengthy troubleshooting session.
Nationwide Embraces Agile
Tim Heller has lived through that more than once at Nationwide, in his former role as associate VP of IT for applications at the insurance and financial services company. "I remember one time I was having a cookout at my home for 40 people, and I was inside on a bridge call," he says. "I understand operational problems."
Heller is now associate VP of IT for applications in Nationwide's Development Center, where he leads 26 development teams that use agile methods to partner with 7,000 IT staffers distributed throughout Nationwide's 23 business units. Ideas for new services filter up from the business areas; when a project gets the OK, the Development Center provides teams of about a dozen people to work through a project, bringing project management and development techniques, including a heavy emphasis on agile tenets of frequent software builds and daily interaction with business sponsors. An operations staffer who understands the business value of the project is recruited to the team to provide documented input, such as if the app will need to scale up for peak demand at the end of each quarter.
Heller calls the process they follow "acceptance test-driven development." The code that's turned over to production generally deploys without mishap because it has been written and tested with operations' concerns in mind. The test environment "mimics the production environment," says Heller (see story, p. 30). Automated testing kicks off each time developers complete a build, and it takes minutes, compared with the days previously spent in manual testing of a completed project. "We know almost immediately if it doesn't perform as expected," says Heller.
Development Center teams have produced 100 applications, and 70% have been defect-free, he says. Before using the short build-and-test cycles, deployment was "manually intensive--an up-all-night event. Now we run a script. … We deploy in a fraction of the time and know within an hour or a few hours if the deployment has succeeded," Heller says.
Nationwide doesn't exactly live the Amazon dictum--Nationwide's version is more like "you build it, you run it (for just a little while)."
"We try to keep pure project development and operations somewhat segregated," Heller explains. A development team is responsible for how the software runs in production for a short time after deployment. Then that development team pivots back to producing new features.
This approach to development has let Nationwide reduce the number of people devoted to testing from 25% to 30% of each team to 10% to 15%. The process is working well enough that Nationwide plans to increase its agile Development Center to 60 teams, from 26, by 2014.
The complications of integrating agile development and IT operations will vary with each industry and company. For example, it can be very difficult when software is being created for deployment in a third-party enterprise customer's data center, says Todd Little, an agile leader in the Landmark Software and Services unit of Halliburton. Landmark creates software for oil and gas companies, and those customers prefer infrequent updates. "We've introduced barriers to slow down the ability to push out changes," Little says. One of those is to check the intellectual property of the code often during development--to spot patent violations or help determine whether Landmark should apply for a patent.
Little is torn about how much to use agile. As the former chairman of the Agile2011 conference for the Agile Alliance, he's a true believer in agile methods. But he admits that there are challenges on the enterprise level that haven't been sorted out, such as the rate of change that's best for operations and IP and compliance issues.
The big-picture goal now is to build more quality into the agile process from project start to finish. The auto industry went through this evolution, where designers had to learn to create cars while considering whether the models were practical to manufacture. There's no one right way to meet the goal of creating software that meets the needs of business units and IT operations. The dev ops goal isn't to build precise, highly defined contributions into every project. Rather, it's "for each side to help the other," says IBM's Ambler.
Go to the sidebar:
Why App Dev Needs A Better Deployment Pipeline
InformationWeek: Nov. 28, 2010 Issue
Download a free PDF of InformationWeek magazine
(registration required)
About the Author
You May Also Like