Few companies are as committed to injecting themselves with the power of DevOps practices as Capital One.
Over the past few years, the 20-year-old financial services giant has been transforming itself, tapping DevOps to establish a continuous software-delivery cycle. Topo Pal, a senior director and senior engineering fellow, told a packed room at the recent DevOps Enterprise Summit in San Francisco that it's a journey that suits the company's entrepreneurial spirit.
"We consider ourselves a startup, and like any startup, we have different DNA," said Pal. "For instance, we build our own software."
All the better for adopting DevOps. The company does just about everything (including back-end processing) in the public cloud now, with a focus on building microservices using open source technologies. Having spent the previous three years building out automation steps, scaling DevOps, adopting open source and cloud, and starting to measure its success, Capital One is now turning its attention to continuous delivery.
It's approaching this in two important ways, starting with removing fear from the equation. Pal said his team has been putting the mechanisms in place to enable what he dubbed the "no fear release." By reducing, or even eliminating, developers' fears that their code would break things elsewhere in the process, or that it would be non-compliant or out of control, Pal hopes to embolden them to unleash more innovation.
At the same time, Pal said Capital One doesn't want its developers to just build and then forget. Hence the adoption of a "you build it, you own it" approach to accountability. That includes every step, from coding and building to testing and deployment.
To support this combination of fearlessness and accountability, Pal and his team borrowed the tried-and-true technology manufacturing concept of the "clean room" and adapted it for the software development lifecycle. Capital One's virtual development clean room is defined by a set of clearly spelled out guidelines intended to effectively scrub code before it's released.
These guidelines include:
- Identifying and registering all product pipelines;
- Making sure everything is under source control, that every change is peer reviewed, and that production changes only occur via code changes;
- Controlling access to production servers;
- Testing and scanning every code change;
- Stopping the pipeline if something fails; and
- Capturing evidence in near-real time and analyzing for discrepancies.
Such a long list of controls is necessary if a company is going to unleash its development teams without worrying about potential implications outside of the development environment.
"It is very hard to ensure that a single developer cannot actually make a change to code and send it to production," said Pal.
The impact of the clean room has been immediately apparent. Pal reported that the company's progress with its DevOps practice has led to the number of products deploying multiple times a day rising from 20 in 2016 to 300 this year, while the maximum number of deployments for a product in a single day has grown from 30 to 50.
But the unavoidable truth is that no matter how many controls and safeguards are present, stuff breaks anyway. In that sense, "continuous delivery" also introduces the possibility of "continuous chaos," and another team at Capital One has been working on solving that very problem.
"We need to embrace failure as part of our development," said Gnani Dathathreya, a director of enterprise architecture, during another well-attended presentation at the DevOps Enterprise Summit. "That is part of the culture change we are embracing at CapitalOne."
To that end, the company has embraced the concept of "chaos engineering," which, when applied, enables applications to remain available, but with limited functionality, in the face of outages. Dathathreya also refers to this approach as "anti-fragility."
[For more from the DevOps Enterprise Summit, check out DevOps Benefits Shine Through at Summit and DevOps Turns Kaiser Permanente from Cranky to Capable.]
To make this happen, however, it's helpful to know what failures might look like, and how applications will behave during those times. That's what led Capital One to a chaos engineering solution known as a "cloud detour," which essentially applies an assortment of disruption scenarios so that the company can test the ability of its apps to withstand a variety of failures and outages.
"Cloud detour addresses the need for a chaos engineer automation tool by providing failure-as-a-service for applications," Sathiya Shunmugasundaram, lead software engineer for Capital One technology operations, told the DevOps Enterprise Summit audience. "You discover so many things."
While this chaos engineering approach is helping Capital One more effectively test its apps, Shunmugasundaram would like to see it running on an ongoing basis as a part of daily operations.
Either way, the combination of a software development clean room and the application of chaos engineering is bringing Capital One's vision of DevOps-infused continuous software delivery to life.