It pays to assess risk properly in making IT and other big decisions. Here's what not to do.
"The dangers of life are infinite, and among them is safety"-- Goethe
We all tend to misjudge risks.
Which animal is the most dangerous: the hippo, shark, bear, lion or tiger? Most people wouldn't guess that those friendly looking hippos cause more human deaths than those other animals combined. But even hippos are safe compared to texting while driving, with its 6,000 annual casualties.
The subject of risk and risk perception is elegantly outlined in The Science of Fear by Daniel Gardner.
Unlikely events are unintuitive. Various biases influence people to make wrong gut decisions. Nobel Prize winner Daniel Kahneman gives a great introduction to these biases in Thinking, Fast And Slow. Here are some that play a role in risk evaluation:
-- Confirmation Bias: I see what I already believe.
-- Anchoring and Adjustment Heuristic: I am influenced by the first piece of information I receive.
-- Ambiguity Effect: I avoid options with unknown probabilities.
-- Bandwagon Effect: I tend to do what others do.
-- Availability Heuristic: The story I remember is more powerful than data.
As a society we pay little attention to truly risky events such as asteroid hits and the overuse of antibiotics. At the same time we are overly sensitive to such relatively small risks as the dangers of using nuclear energy.
A light-hearted approach to personal risk is the use of micromorts. A micromort is a micro-probability, a one in a million chance of dying. A lifetime probability of dying is one mort, so one day costs about 39 micromorts for the average person. Smoking 1.4 cigarettes costs a micromort, same as living within 20 miles of a nuclear power plant for 15 years. Micromorts are a good way to compare relative risks.
So how does this analysis apply to IT? Here's an example:
I was tasked to drive the technical infrastructure design of a complex manufacturing system. We had to make decisions about the failover capabilities. We used some of the usual tools (component failure impact analysis, fault trees) to hit the sweet spot: acceptable availability with a reasonable price tag. The business owners didn't like the design; they wanted to spend more money and increase the availability. When we pointed out that, based on historical data, 90% of the outages were due to human errors, the light bulbs came on: We could have spent an extra million dollars to move the technical environments to five 9s, but it would have an impact on only 10% of the unscheduled downtime. Exposing the relative risk helped us to make the right investment decision.
Following are a dozen risk-related IT anti-patterns and worst practices.
1. Whose signature is it anyway? Often the inappropriate person takes the risk. IT is seldom the owner of the business process or steward of the information. IT can help express the likelihood of an event, offer solutions and calculate costs. Signing off on a particular solution and associated risks should be the responsibility of the business owner.
2. Complexity. Growing IT complexity increases the probability of losing data integrity, confidentiality and/or availability. Gaining control over the complexity inspired the first enterprise architecture frameworks in the 1980s. Despite many similar efforts, success stories of addressing IT complexity are rare.
3. Intangible risks. Some impacts are difficult to measure. Therefore, calculation of such risks is up to subjective interpretation and politics.
4. Human error. Studies agree that the most common cause of system downtime is human error. Focusing on the technical aspects won't address this problem. The best way to approach this risk question is to look at the whole people/process/technology stack.
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.