Central role of the track record in risk analysis

–Assume you, a competent specialist, have been tasked to undertake a risk analysis for an energy utility that provides electricity, natural gas, or both to a large region of major urban, agricultural and natural resource users.

The utility wants to identify risks and uncertainties that, if left (further) unaddressed, would have severe consequences for its operations (think: induced wildfires or pipeline explosions). Utility operations, including the technologies, are so complex that “accidents are waiting to happen,” when unidentified and/or unattended.

Your analysis begins conventionally by identifying and isolating weak or vulnerable elements in the utility operations, be they in physical structures or in specific tasks (think: corrosion in pipes or so-called operator error). No chain is stronger than its weakest link, so the wisdom goes, and if the utility doesn’t know the weakest, then operations are in an important respect merely failing to fail. (One senior risk manager in a major utility went straight to the point: “Really, just because we haven’t had a meltdown doesn’t mean our practices were effective”.)

That conventional point of departure focusing on the weaker or weakest links no longer takes risk assessment and management far enough, and honestly it would be irresponsible to stop there.

In the first place, you are not dealing with chains of processes and technologies only. It is the system of operations that is the unit of your risk analysis, and indeed the utility operates its electricity or natural gas system as a system. This means that if it loses an element it frequently has another way or ways to maintain service across the system as a whole. This applies as well when the lost element is the weak link—not always, of course; but more frequently than you might suppose.

Which leads to our second problem with a weak-link focus on the part of the risk analyst. Not only was the utility’s system originally designed to have back-ups for handling contingencies and failures, the important point is managerial: The strength of the utility’s operations derives in good part from its weak links, and not in spite of them. The gas operations of the utility are as reliable and safe as they are because it is known that those pipes corrode that way under these conditions. Not only is the weak link frequently known, it has many eyes focused on it, particularly if it is already recognized to be a chokepoint for real-time systemwide operations.

–In fact, when conventional analysis is pushed further, risk isn’t the “negative” that determines the standard or remedy to correct; the standard of safety and reliability adopted determines the risks requiring managing to meet that standard.

Standards, to be standards, need constant probing to ensure they are not overly simple or overly complex with respect to the safety and reliability mandated. That is, the standards of service reliability and safety—in order to function as standards—require managing the entailed risks and uncertainties as the chief way to probe, if not test, the efficacy of the standards. This makes risk a positive, not a negative, when it comes to management.

–When so, the logical and empirical prior question that the analyst in our thought experiment has to answer becomes this: Have the utility managers demonstrated a real-time track record by way of experience and training in addressing circumstances under which (1) they didn’t know what they initially thought they knew, (2) they in fact knew more than they initially thought, (3) or both?

If the track record is one of success (measured in terms of ensuring system reliability and safety), then the analyst can better trust the utility’s system operators and immediate support to know and appreciate how inexperienced they still are when it matters. But the existence of any such record must remain an open question for the analyst.

I cannot over-stress the importance of this track record. For the major risk factors may already be well-known by real-time system operators. It’s not a matter that key risks are unknown or invisible to them. Which is to say that it’s the track record that is the proper unit of analysis for our inquiring risk analyst.

Is there a track record of learning-and-unlearning by utility operators and support staff around, say, meeting the challenges their equipment fires pose to system reliability and safety? Or are equipment fires something consultants and others worry more about than utility operators, who worry about different risks over which there is a track record of more real-time learning/unlearning? Or are some utilities under such pressure to change as systems that track records of any important sort become more and more difficult to establish?

One last point. When colleagues and I started to study the reliability of large critical infrastructures—that socio-technical anchorage of the modern—a paradox confronted us: How can they be so reliable with so much that could go wrong at any second,and horripilatingly so? Yes, of course, there were always very real problems, but how could these infrastructures be managed as reliably and safely as they were with so many moving parts and potential interactions beyond human comprehension?[1]

While my answer above starts by shifting the focus in risk and reliability management to the track records of experience and training, it also is necessary not to miss the performative, self-referential nature of the question: In rendering them as a “paradox,” we demonstrate our own cognitive limits of understanding the systems we study. We too are in the midst of unfolding and adjusting our inexperience with them, where risk assessment and management centers on that shuttle between experience and inexperience.

Related entry: “Seeing unknowns”

[1] This paradox has been effused about for some time. Here is Frederick S. Williams on the British railways from his Our Iron Roads, 1888:

This immense number of passengers and enormous bulk of goods are drawn by engines of the most complicated mechanism, held together with millions of rivets, each engine—containing an intricate network of tubes, numerous cranks, and other delicate pieces of workmanship, and the engines and vehicles are connected by chains and couplings. In every separate item of all these innumerable parts lurk elements of danger, and the slightest fracture might produce disaster. All this is done, and with what result? That there is no safer place in the world, as Professor De Morgan said some years ago, and it is still true, than a railway train. (accessed online on March 24 2016 at https://archive.org/stream/ourironroadsthei00willrich/ourironroadsthei00willrich_djvu.txt)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s