The limitations of root cause analysis

Learning lessons from projects is not as simple as you may think! Projects are complex adaptive systems linking people, processes and technology – in this environment, useful answers are rarely simple.

Certainly when things go wrong stakeholders, almost by default, want a simple explanation of the problem which tends to lead to a search for the ‘root cause’. There are numerous techniques to assist in the process including Ishikawa (fishbone) diagrams that look at cause and effect; and Toyota’s ‘Five Whys’ technique which asserts that by asking ‘Why?’ five times, successively, can you delve into a problem deeply enough to understand the ultimate root cause. The chart below outlines a ‘Five Whys’ analysis of the most common paint defect (‘orange peel’ is an uneven finish that looks like the surface of an orange):

These are valuable techniques for understanding the root cause of a problem in simple systems (for more on the processes see WP1085, Root Cause Analysis); however,  in complex systems a different paradigm exists.

Failures in complex socio-technical systems such as a project teams do not have a single root cause. And the assumption that for each specific failure (or success), there is a single unifying event that triggers a chain of other events that leads to the outcome is a myth that deserves to be busted! For more on complexity and complex systems see: A Simple View of ‘Complexity’ in Project Management.

Complex system failures typically emerge from a confluence of conditions and occurrences (elements) that are usually associated with the pursuit of success, but in a particular combination, are able to trigger failure instead. Each element is necessary but they are only jointly sufficient to cause the failure when combined in a specific sequence. Therefore in order to learn from the failure (or success), an approach is needed that considers that:

  • …complex systems involve not only technology but organisational (social, cultural) influences, and those deserve equal (if not more) attention in investigation.
  • …fundamentally surprising results come from behaviours that are emergent. This means they can and do come from components interacting in ways that cannot be predicted.
  • …nonlinear behaviours should be expected. A small change in starting conditions can result in catastrophically large and cascading failures.
  • …human performance and variability are not intrinsically coupled with causes. Terms like ‘situational awareness’ or ‘lack of training’ are blunt concepts that can mask the reasons why it made sense for someone to act in a way that they did with regards to a contributing cause of a failure.
  • …diversity of components and complexity in a system can augment the resilience of a system, not simply bring about vulnerabilities.

This is a far more difficult undertaking that recognises complex systems have emergent behaviours, not resultant ones. There are several systemic accident models available including Hollnagel’s FRAM, Leveson’s STAMP that can help build a practical approach for learning lessons effectively (you can Google these if you are interested…..)

In the meantime, the next time you read or hear a report with a singular root cause, alarms should go off, particularly if the root cause is ‘human error’. If there is only a single root cause, someone has not dug deep enough! But beware; the desire for a simple wrong answer is deeply rooted. The tendency to look for singular root causes comes from the tenets of reductionism that are the basis of Newton physics, scientific management and project management (for more on this see: The Origins of Modern Project Management).

Certainly starting with the outcome and working backwards towards an originally triggering event along a linear chain feels intuitive and the process derives a simple answer that validates our innate hindsight and outcome bias (see WP1069 – The innate effect of Bias). However the requirement for a single answer tends to ignore surrounding circumstances in favour of a cherry-picked list of events and it tends to focus too much on individual components and not enough on the interconnectedness of components Emergent behaviours are driven by the interconnections and most complex system failures are emergent.

This assumption that each presenting symptom has only one cause that can be defined as an answer to the ‘why?’ is the fundamental weakness within a reductionist approach used in the ‘Five Whys’ chart above. The simple answer to each ‘why’ question may not reveal the several jointly sufficient causes that in combination explain the symptom. More sophisticated approached are needed such as the example below dealing with a business problem:

The complexity of the fifth ‘why’ in the table above can be crafted into a lesson that can be learned and implemented to minimise problems in the future but it is not a simple!

The process of gathering ‘lessons learned’ has just got a lot more complex.

Advertisements

6 responses to “The limitations of root cause analysis

  1. In the Root Cause Analysis sessions I do with IT projects and Agile teams, we always find multiple causes. Often it is a combination of causes that resulted in significant problems.

    I’ve gathered several practical tools, like checklists and templates to do a Root Cause Analysis at http://www.benlinders.com/2011/root-cause-analysis-practical-tools/. They help people to do effective Root Cause Analysis, and to take actions to prevent similar problems from happening again.

  2. Pingback: Learning Complexity — Leadership Series – 1 « rgbwaves

  3. Pingback: Learning Complexity – Leadership Series .. 2 « rgbwaves

  4. This is a great summary of the issues. I like the fact that you don’t throw the baby (root cause) out with the bathwater (complexity) like many of the complexity researchers. For some reason, they can’t or won’t acknowledge that most of us moved on from the “single linear cause chain” model a long time ago; in fact, when I was first trained in root cause analysis about 20 years ago, we were told that we should expect to find 2 or 3 root causes along with a few contributing causes. Perhaps more people have hung on to that model than I realize, but not in the circles I run (high hazard industry folks).

    In addition, I think most of us have gone well past the idea that “human error” can ever be an acceptable root cause; it seems pretty well accepted that people involved in events had reasons for what they did that made sense to them at the time, given where they were and what they knew. When you get right down to it, you have to look at how and why the system responded “badly” to the initial disturbance (systems and complexity) and hid necessary information from the people involved (human factors).

    Finally, I like your 5 Whys table example, as it shows one good method to implement what I have called 5×5 Whys… the “x5” being a series of 5 questions designed to promote inquiry and validation of cause existence, necessity, and sufficiency, and to drive the consideration of multiple, perhaps interconnected, causal chains. http://www.bill-wilson.net/b73

    Regards,
    Bill

  5. Pingback: The limitations of root cause analysis | CAI's Accelerating IT Success

  6. Pingback: Introduce Change – “Agile” – "Let go until nothing's left"

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s