Error and Safety

–A key virtue of operating within the shared comfort zone of team situation awareness in the infrastructure control room is knowing when it is an error to comply with a regulated task or technical protocol that, in the case at hand, would work against system reliability and safety. Correcting for errors is a key function of high reliability management in real time.

When operators are, however, pushed out of their comfort zone into unstudied conditions (say, by defective technology, policies or regulations), they find themselves unable to perform reliably there. Operators then perform under conditions where the identification of what is or not “error” defaults, ironically, to whether or not compliance mandated by the regulator of record takes place. “Sticking to procedure” ends in where there is no procedure and then “operator error,” which sets into play a perverse cycle.

Ritualized calls arise for foolproof technology, systemwide redesign, policies or regulations to correct for the mistakes. The effort becomes one of trying to macro-design micro-errors away, as if there were no middle domain of reliability professionals in real time. Macro-micro leaps of faith are lethal to systemwide reliability, we have repeatedly seen; they are, however, a permanent feature of calls for more regulation and policy.

–One upshot of the perverse cycle is that it’s a mistake to think all errors are mistakes. What needs to be distinguished is whether the errors/mistakes occur within or outside the control operators’ comfort zone. Tracking and responding to the differences are invaluable.

Why? Because many complex infrastructures we study treat uncertainty with respect to different types of errors as useful information. As Paul Schulman puts it, uncertainty isn’t the lack of information; it is itself a kind of information about where the socio-technical systems is in real time as a system:

In nuclear power plants, commercial aviation (including air traffic control systems), as well as other critical infrastructures, a distinctive form of error management has been a framework for high reliability. For these organizations the inverse of knowledge is not ignorance or uncertainty – it’s error. They identify and categorize uncertainty in relation to specific errors in decisions and actions they seek to avoid in order to preclude outcomes that are surrounded by not only organizational but societal dread.

–Yet for all these nuances, “error” continues to be treated as Bad in much of the literature on Safety Culture.

An analogy helps. The Roman Catholic Church had the early problem of how to treat Islam. It couldn’t be paganism, because Islam also held there to be One God and indeed shared notables, like Jesus and Noah. To make things fit, the Holy See declared Islam was not paganism but a Christian heresy, along the lines of Arianism or Socinianism, which questioned the Trinity or Jesus’ divinity.

So too today for that one great religion, Safety, with its one great heresy, “Operator Error.” Yea, though we all be fallible, operator error is bad, bad, bad. Even when operators don’t see it so; even when operators correct for forced errors all the time; even when they manage for error in their comfort zone. In other words, when really-existing error is not defined by dogma, matters become more usefully complex. People make mistakes and, yes, you can’t unring the bell once rung, but it’s always been more complex than that.

Principal source.

P.R. Schulman (undated). “Reliability, uncertainty and the management of error: New perspectives in the Covid-19 Era.” Unpublished manuscript.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s