Resilience v Risks

A common fallacy is the assumption that effective risk management systems will remove or eliminate all risk from a project or a business. $millions are spent by organisations trying to avoid risk and they fail. It is not possible to predict the future with certainty, if it was possible, casinos and bookmakers would be bankrupt.

What is possible is to proactively manage known risks in a way that maximises opportunity and minimises the damage caused by threats that eventuate. It is also possible to refine and improve systems to minimise variability and therefore increase predictability.

Sensible management balances the gains from improvements in the risk exposure against the costs and optimises the outcome. But this leave three areas of risk unaccounted. The first is the known risks that it is simply too expensive or too difficult to reduce any further. The second is the known risks that simply cannot be managed or transferred. The third category is the unknown unknowns, the risks we simply don’t know we don’t know about.

All three of these risks can be offset by the creation of contingency allowances but this is expensive and potentially wasteful. Because we cannot know what we don’t know it is impossible to calculate appropriate contingencies. Another approach is needed.

Resilience is the ability to recover quickly from an unforseen event or ‘the ability of a system to return to its original state after being disturbed’. Build resilience into you business unit or project team and you have the capacity to deal with the consequences of unforseen risks.

I have been rather aware of the recent problems experience by Qantas given I’m flying on one of their aircraft from Sydney to Argentina on Monday! The major issue was the uncontained explosion of a Roles Royce Trent 900 jet engine after take off from Singapore on November 4th. Commercial airline engines are not supposed to explode! Whilst they break reasonably frequently, all of the debris are supposed to be contained within the engine. The Qantas A380 suffered shrapnel damage to wiring, its fire systems and wing fuel tanks (the plane was extremely lucky to avoid a catastrophic fire).

One of the factors that saved the situation was the flight deck crew. By chance there were four highly experienced pilots and a second officer on-board. Qantas routinely use two experienced pilots as captain and First Officer, in addition there was a senior ‘check captain’ undertaking a ‘route check’ on the pilots and another senior officer checking the checker. Between them the crew had over 60,000 hours of flying experience and they still had to work flat out for over an hour to understand and control the situation before making a safe landing.

The flight crew had an abundance of resilience created by the experience of the flight crew. This was partly good luck and partly Qantas policy. There was enough brain power, experience and wisdom to stay on top of the situation. Whilst every airline trains its crews in flight simulators to deal with all sorts of emergencies, you can virtually guarantee Qantas had not trained its crews specifically for the circumstances that occurred on the 4th November. They were never supposed to occur; the situation was probably closer to what happens when a large aircraft is hit by a missile then any normal emergency.

So how do you build resiliency into a work team? There seems to be three elements.

  • The first is to have practiced dealing with a range of emergency issues needing responses. This helps develop systems and procedures.
  • The second is to have spare capacity available to the team. It’s unlikely you can afford the Qantas solution of two qualified captains ‘on deck’ but it should be possible to develop flexibility within the larger organisation to make resources quickly available and to have those people familiar and friendly with the core team so they ‘fit in’ quickly.
  • The third element is trust. Everyone needs to be able to trust their team mates and understand their capabilities. The value of trust is discussed in our WP1030 – this is even more important when dealing with an unexpected risk event.

Resilience is not an accident; it is created by implementing strong processes, procedures and systems. These are far better value drivers than contingencies. Contingencies add no value; they are simply drawn down in emergencies. Resilient systems are also effective systems and therefore value creators.

However, there is a risk associated with resilience. Resilient system that can absorb issues and catch risks before they become significant appears to be a ‘comfortable system’. Unwise cost cutting can remove resilience and in the short term there appears to be no disadvantage. However, this is a very dangerous illusion. As the system is rendered less effective issues get picked up later and require more resources to correct. This further destabilises the system and the ‘tipping point’ into dangerous disfunctionality can easily be passed without anyone realising there’s a problem until the next risk event occurs and the system fails.

Resilience is a valuable asset. Management need to make sure it’s cultivated and nurtured to support the other aspects of effective risk management.

3 responses to “Resilience v Risks

  1. Resilience is a “mitigation” strategy. Fail safe is a mitigation strategy.

    Statements like “A common fallacy is the assumption that effective risk management systems will remove or eliminate all risk from a project or a business.”

    Could you suggest the domain where this a “common fallacy?”

    The failure of the Trent 900 was a design flaw that was contained, but the engine failed safe, no one died, the aircraft did not crash.

    As one of the designers and developers of a Fault Tolerant process control system, certified by SINTEF (http://www.sintef.no/Home/) and TÜV (http://www.tuv.com/us/en/index.html) as “fail safe,” I might suggest looking at the Risk Management guidance found at NASA, US DoD, NAVAIR (Naval Aviation), NRC (Nuclear Regulatory Commission), and the US Department of Energy. I have hands on experience with fail safe modes in the RR RB211 driving compressors. The Trent comes from the save shop in Darby UK.

    There you will find guidance on designing, developing, and operating system who’s failure modes impact the lives of others. There you will NOT find “the assumption that effective risk management systems will remove or eliminate all risk from a project or a business.”

    You’ve created a false analogy and then argued against it. You’re loyal readers (and I’m one) deserve better research on this complex topic of risk management in projects.

    • The maturity of risk management varies enormously.

      Aerospace, oil and gas are relatively mature. Areas that seek to ‘eliminate all risk’ include construction projects let on fixed price contracts to remove cost risks – it does not work, and has not worked for the last 150 years but everyone still tries…. see: The Meaning of Risk in an Uncertain World at:
      Most business projects approved on fixed time and cost before scope or benefits are finalised.

      In Australia where the decision to buy the Joint Strike Fighter was set up with a fixed budget and ‘expected’ timeframe before the prototype flew……

      The risk paradox is that to get a project approved in most circumstances the proposers need to show the project is safe, defined and understood, to with a competitive bid the vendor needs to show they fully understand the work and their bid is safe and defined. But in reality all that is defined is the first level of the problem (often not event the root cause).

  2. Pat,
    Yes the maturity of risk management varies wildly, but the processes of risk management don’t.

    The statement “A common fallacy is the assumption that effective risk management systems will remove or eliminate all risk from a project or a business” may confuse the principles of risk management, with the practice of risk management.

    But it is important to separate the poor practices from the principles – which are independent from the domain.

    I’m confused by the unqualified statements like the one above.

    Pat, great care is needed with JSF. Starting with http://www.gao.gov/new.items/d10520t.pdf and http://www.gao.gov/products/GAO-10-1020R are some starting points.

    Programs with heavy political content usually operate outside the domain of analytical assessments.

    I’d agree with your last paragraph IF you had added the variance boundaries to the statement “fully understand the work and their bid is safe and defined.”

    If there are projects with the “risk paradox,” the funders get what they deserve if they fail to understand the upper and lower limits of the variance and the shape of the PDF defining these boundaries.

    A larger question – is why do project managers and project funders do stupid things on purpose?

Leave a comment