Strategic CIO // Executive Insights & Innovation
Commentary
3/13/2014
09:06 AM
Rajat Bhargava
Rajat Bhargava
Commentary
Connect Directly
RSS
E-Mail
50%
50%

DevOps Demands You Fix Twice

If you've taken time to get everything just perfect, you haven't truly adopted the rapid iteration, lean product development ethos.

My friend and colleague Richard Miller, CTO of genealogical search engine Mocavo, likes to say, "Fix everything twice." When I first heard him say that, I didn't quite understand what he was getting at. Why not just fix a problem properly the first time so it doesn't come up again? He replied that it's just like carpentry -- measure twice, cut once. Richard was starting to mix metaphors, and he lost me.

What Richard's talking about are DevOps and lean startup principles. Organizations today -- especially startups -- are moving too fast to get everything right all the time. Add to that how quickly technology is shifting, and Richard's "fix twice" adage makes perfect sense, even for enterprises.

DevOps is an extension of Agile that involves the rest of the organization in rapid iterative releases, focused on driving increased customer value. Quicker turns on the product mean bringing new capabilities -- presumably features customers are asking for -- to market faster, fostering a competitive business. DevOps uses a more integrated team environment and automation to deliver on accelerated product releases

One result of iterating often is that organizations don't feel the need to build far beyond their scale, because they know that they can come back in the next sprint and up the ante. Compare that to the old model, where you have only a couple of shots to get things right. You can't really go back and attack shortcomings after the fact. Shorter release cycles mean, in general, that you can adjust quickly.

All that said, when product releases really get rolling, some things are going to break. Fortunately, DevOps is about things breaking. If you've taken time to get everything just perfect, you haven't truly adopted the rapid iteration (or lean product development) ethos.

DevOps means speed, our 2014 survey says.
DevOps means speed, our 2014 survey says.

If your organization can build and release 12 times in a year, will you have a better, more useful product than if you had released only at the end of the year? In almost all cases, yes. The organization that can release more quickly will win. It will be able to adjust based on feedback so that it's building the product that most closely matches what customers are looking for. Additionally, it's going to hear about problems from customers continually versus waiting to find out how the product worked, or didn't.

This is where "fix twice" comes in.

In a DevOps culture, IT teams are moving fast. Solutions to problems need to happen just as quickly. An application or site goes down? Put a Band-Aid on the problem and stitch it up later. That's fix number one -- quick and dirty. The goal is to stabilize the issue with the least amount of effort. The team is generally working under pressure and duress. It's difficult for those involved to think about anything other than getting things operational again.

[If you realize that mobile security means more than ensuring users don't download malware-bearing games from the Android store, take our 2014 survey and enter to win a 32 GB Kindle Fire HDX.]

The real magic happens in fix number two. This is where the team goes back, studies the problem, and takes action to fix it permanently -- or as close to permanently as is practical at the time.

A DevOps team under pressure to rectify a major issue doesn't exist in an environment conducive to strategic evaluation and problem solving. By putting a patch on the initial problem, you bought breathing room. Fix two gives the team an opportunity to find the best long-term solution.

People can debate different approaches, spike on the problem, even try a few different tactics before settling in on the right long-term strategy. Ironically, a team that has experienced failure can better evaluate the problem and focus on creating an ideal, longer-term solution than one that is trying to solve a problem in advance.

Perhaps the most significant challenge with this approach is that most organizations don't have the discipline to get to fix number two. Fix number one took the pressure off. As long as the Band-Aid stays stuck, there's little incentive to circle back. With DevOps and IT folks stretched so thin, it's often difficult to pick up our heads and think about longer-term problems.

Unfortunately, with technology, by the second or third time a problem rears its head, the situation is generally pretty dire. It will take a lot more than Band-Aids to keep things together.

The best DevOps organizations don't let the same problems recur. They schedule time during the sprint cycle to permanently address core issues, and as a result have longer-term fixes, which translate into greater stability.

So while Richard likes to quip "fix twice" often, the real message is not just about solving specific problems. It's about building a methodology and inculcating the discipline to execute on that method. DevOps is at the core of this approach, as is the ability to move fast. For IT organizations that want to be the best, fix failures twice.

Female IT leaders attending the InformationWeek Conference can join InformationWeek.com Editor-In-Chief Laurianne McLaughlin and Rebecca Kaul, President of the UPMC Technology Development Center, for a peer networking breakfast. Then join your peers at our Interop Women in Technology Panel & Luncheon for an open forum to discuss how to advance in an IT organization, keep your skills sharp, build fruitful relationships with colleagues, learn effective dispute resolution techniques, and build a mentoring network. Space is limited to 50 participants.

Rajat Bhargava is co-founder and CEO of JumpCloud Inc., a provider of server management and security tools for DevOps and IT professionals. An MIT graduate with two decades of experience in industries including cloud, security, networking, and IT, Rajat is an eight-time ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
3/18/2014 | 6:37:01 PM
Re: OHIO principle
Kevin, I guess I'll buy that. I do think it'll be a telling indicator of the maturity of an org's devops practice as to how religiously they do circle back. One a fire is out, it's human nature to move on.
richardkmiller
50%
50%
richardkmiller,
User Rank: Apprentice
3/18/2014 | 6:22:05 PM
Re: OHIO principle

Hi Lorna! I think "fix twice" actually goes well with "only handle it once" if we're thinking long-term. When a system fails, everyone makes it first priority to get back to the baseline. It's a fire. Drop everything. Stephen Covey would call this quadrant 1 -- important and urgent.

 

However, once we're back to baseline, there's usually little organizational pressure to do anything more. It takes a special mindset to take the fix further, preventing a future recurrence. This would be quadrant 2 -- important but not urgent.

 

It's the 2nd fix, preventing a future recurrence, that is most loyal to "only handle it once" because (theoretically) we should never have to return to the issue. That's how I think about it.

 

 

akeenan452
50%
50%
akeenan452,
User Rank: Apprentice
3/14/2014 | 11:33:10 PM
Re: OHIO principle
Having been in IT for decades I remember this as finding a workaround. Some software has to run no matter what. So you identify the problem quickly and come up with a workaround that will allow it to run. Think of it as ER stabilizing a patient. This give you breathing room so you can identify a true solution. With the speed software is being produced it is now common for software to be released that is not idea but is useful. It is expected that in the next release refinements and enhancements will be made. It is also common to add a feature only to remove it later. People may believe they need something only to find out later that it is a sortof nice to have. Now we can remove features as well as add them.
rbhargava
50%
50%
rbhargava,
User Rank: Apprentice
3/14/2014 | 9:04:50 PM
Re: OHIO principle
Lorna – thanks for the comment. Yes, absolutely, if a team has time to document the problem and understand it deeply while it is occurring then that is positive. Saving artifacts and materials that would help you recreate the issue or remind you of the depth of the problem is important if you can do that too. The challenge in all of this, of course, is that in many DevOps / modern environments their infrastructure is their life blood and when that is down they are losing something significant for their business – e.g. ability to accept orders, logistics, support, etc. That means the first priority is just get back to normal operations which leaves little time during an outage or issue to spend time evaluating the best way to solve the issue permanently. This is a tricky issue, but I think with increased pressure on keeping systems operating, fixing a solution quickly ends up becoming the norm. The real challenge is in making sure that you really do go back and solve the problem more completely and sustainably after.

 

-Rajat
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
3/13/2014 | 11:20:56 AM
OHIO principle
This flies in the face of the venerable "OHIO" (only handle it once) principle that says, "once you've taken time to understand a problem you should just fix it." Otherwise, you must circle back and redo that work of garnering insight. And of course, time is no friend of memory.

Would you recommend spending time during the first fix to thoroughly document the problem?
The Business of Going Digital
The Business of Going Digital
Digital business isn't about changing code; it's about changing what legacy sales, distribution, customer service, and product groups do in the new digital age. It's about bringing big data analytics, mobile, social, marketing automation, cloud computing, and the app economy together to launch new products and services. We're seeing new titles in this digital revolution, new responsibilities, new business models, and major shifts in technology spending.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 20, 2014
CIOs need people who know the ins and outs of cloud software stacks and security, and, most of all, can break through cultural resistance.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.