The way Ivan Pepelnjak sees it, every company could benefit from its own Chaos Monkey.
What's Chaos Monkey, you ask? It's a service Netflix built -- and now relies upon --to ensure the resiliency of its applications running on Amazon Web Services. It does this by randomly shutting down servers to see what the effect is on applications, and thus the customer experience. Netflix kills thousands of virtual instances each year this way, solving any problems that arise, and each time its applications get a bit stronger and more reliable.
As a result, the company doesn't care about hardware availability. It has dozens of virtual servers supporting the same application, and it doesn't even worry about replacing the servers it kills as it's constantly adding new ones to handle increased user loads.
"Their application architecture is resilient," Pepelnjak, chief technology adviser at consultancy NIL Data Communications, said in an interview. "If any component fails, the whole operational stack is still working."
It's an approach that's long proved successful at Netflix and other big-name online services, such as Amazon and Google. And much to Pepelnjak's pleasure, it's starting to catch on elsewhere.
"That type of mentality is slowly moving into the enterprise side of things," Pepelnjak said. "It will make things simpler, cheaper and easier to manage. You stop worrying about all the possible failure scenarios. You only worry about how resilient is your application."
Read the full article at Network Computing.
Don't miss Ivan Pepelnjak's workshop, "Designing the Virtual Network for the Software-Defined Data Center" March 31 at Interop Las Vegas. Register today!