- When defining risk of failure as the product of the probability of failure (Pf) times the consequences of failure (Cf), Pf and Cf are NOT independent of each other, as conventional risk analysis would have it.
a. Both are connected indirectly by the “intervening variable” of their shared failure scenario. It’s Pf and Cf with-respect-to the same failure scenario.
b. Further, the more granular the failure scenario, the more likely that Pfs and Cfs are directly interconnected. In the case of interinfrastructural cascades, one consequences of infrastructure1 failing (Cf1) may be to increase infrastructure2’s probability of failure (Pf2).
- Less rather than more granularity in the failure scenario might account for fewer criteria with respect to what qualifies as “effectiveness” in normal operations under conditions of high turbulence.
a. We have argued, for instance, that an explosion at a gasline section in a utility’s natural gas transmission system has to be analyzed in terms of its consequences within the system and inter-system levels as well. (The same could be said for fires induced by a utility’s electricity transmission system.)
i. It may be that the natural gas system operated reliably at the systemwide level, where the infrastructures that depended on natural gas provision also operated reliably during the explosion/fire.
ii. The negative consequences of the explosion are, in other words, offset by the positive consequences of maintaining systemwide reliability and intersystem dependencies.
b. The point here is that a failure scenario exclusively focused at the site-level within a system can miss scenarios (and related criteria) for maintaining normal operations at the systemwide and intersystem levels under disturbance conditions.
- Identifying risk(s) in the absence of first defining the operational system and the reliability standard(s) being managed to ends up with having no stopping rule to the possible failure scenarios and types of risks/uncertainties that matter.
a. Accordingly, all manner of things end up posing risks and uncertainties, e.g.
…different assets; multiple lines of business; with respect to system capacity, controls and marketing factors; in terms of the risks’ time-dependence versus independence; in terms of the risks associated with emergency work as distinct from that planned; investment risks versus operational ones; risks with respect not only to system safety and reliability, but also organizationally in terms of financial risk and regulatorily in terms of risks of non-compliance….ad infinitum
b. This lack of a stopping rule for failure scenarios to be worried about represents a hazard or is its own failure scenario, when it discourages (further) thinking through and acting on failure scenarios about which more is known and can be done.
What does this all add up to?
Return to the central point that risk of failure is the probability of failure (Pf) times the consequence(s) of failure (Cf) with respect to a specified failure scenario, in this case, catastrophic failure scenarios to the real-time operating infrastructure as a whole.
Where and when so, the infrastructure’s risk mitigation programs and controls become a priority source of indicators and metrics reflecting how seriously catastrophic failure scenarios are treated by infrastructure managers.
Indeed, the existing controls and mitigations may provide the only real evidence, outside the real-time management of the control room, of what currently works well (or not) with respect to improving not only system reliability but system safety when pegged to catastrophic system failure. A clear priority for safety management would be to identify, assess and better validate already existing risk mitigation programs, controls and their metrics in terms of their own failure rates, given their associated catastrophic failure scenarios.