Six take-home messages from recent research on large socio-technical systems

1. The large socio-technical system in failure differs, oft-times radically, from that system in normal operations, where really-existing failure often ends up as a critique of earlier or prior definitions of “normal” and “failed.” Four corollaries are worth signaling:

First, the opposite of failure isn’t success; it’s achievement of reliable operations. (Reliable does not means invariant. In fact, invariant operations are highly unreliable.)

Second, if you know how normal operations work but do not know what failed operations will look like once failure occurs, then learning from failed operations and learning from normal operations must be very different. (Indeed, if failure is much more common than success, then the central tendency will to be regress to the mean after a success, i.e., to go back to what produces failure or are its initial conditions. If so, then doesn’t it make more sense to find ways to learn from failure than it is to learn from success?)

Third, system failure is the place where everything is actually connected to everything else, since each thing ends up as a potential substitute there for about anything else. “Need unites everything,” as Aristotlean notion has it, and need is greatest in collapse.

Four, one irony of the fact that normal operations tell us very little about failed operations is that failed operations often look over-determined in hindsight. After the fact, many factors can be found to independently contribute system failure if you are at a loss to say how normal operations on their own transformed into system meltdown.

2. Consider all those graphics that show large socio-technical systems to be densely interconnected with other systems. Not all of interconnections, however and importantly, are ones of tight coupling and complex interactivity primed to fail in no time flat when normal operations are breached:

First, control rooms in many critical infrastructures manage interconnections so as to render them more loosely coupled than tightly so, and more linearly than complexly interactive.

Second, this management requires very, very smart people, and ones who are decidedly not automatic ciphers that need only know the difference between two prices in order to act rationally. What is irrational are those leaps from macro-design to micro-operations or back that ignore, when not altogether dismissing or ignoring, the knowledge bases and learning of the reliability professionals in between.

Third, in recognizing the limitations of macro-design and micro-operations, the operational knowledge in between basically redefines, retroactively, what those micro-operation and macro-design were “really” about.

Note, I am not saying that experience trumps design. Rather I’m insisting that both experienced-based micro-operations and design-based macro-policies and regulations are each insufficient as modes of reliability management. To adequately appraise risk, for example, experience must be allowed to critique macro-design and design must be allowed to identify the blind-spots of individual experience. The people with the skills, perspective and placement to do this effectively are reliability professionals. Their scenario building skills add a cautionary note to experience, and their pattern recognition skills help identify and fill in the gaps in design.
For example, the orthodoxy has been that to spread risk is a Good Thing. The problem arises when the risks end up highly correlated. That’s real pattern recognition for you. Novel financial instruments to spread risks ended up increasing their association with each other, as we saw in the run-up to 2008. That’s means managing the mess from the middle for you.

3. The messier the large system is, the more noise; and the noisier it is, the easier it is to confuse said noise for “the intentions” of system actors. Other post-hoc rationalizations—bureaucrats were mindlessly following the rules—also turn out to be more complicated at the case level on further inspection:

First, it is at level of the case and the event that you see power at work. (Another way of putting this: Do not commit the error of those who predict a future knowing full well they have no part in creating it. At least those utopian Saint-Simonians knew enough, it is said, to dress with buttons on the back of their clothes, so that others were required to help them dress, thereby fostering their communities.)

Second, at the case level you get to see things anew, if not for the first time, then as if so. Why? Because contingency and surprise are most visible case-by-case—which is to say the world in important senses is not predictably reducible to politics, dollars and jerks. (Each angel is its own species, argued Thomas Aquinas; Roland Barthes asks, “Why mightn’t there be, somehow, a new science for every object?”)

Third, at the case level you get to see why it comes as no surprise that behavior, practice, and implementation on the ground differ from the plan, design and law said to govern them. This finding is so unexceptional that when things do work as planned on the ground this must be a surprise worthy of study.

Fourth, it also should not be surprising that generalizations about power and such made from or in the absence of the case material are provisional and contingent—more so certainly than the generalizer commonly supposes. Such generalizations are better understood as only text on the surface of a palimpsest whose specifics have been overwritten and effaced below.

4. In reality, the chief challenge to governance isn’t so much the gap between the legitimacy and the capacity to govern as it is the societal complexity that undergirds widening or closing any such gap. This is to insist that not all of complexity’s surprise is negative; some surprises are good messes to be in.

5. In a complex world whose messiness lies in having many system components, each component serving different functions, and multiple interconnections among the many functions and components, a field’s blind-spots can often be strengths under different conditions. Three corollaries are to be noted:

First, science and technology are at their best when each admits to the blind-spots its very strengths demonstrate.

Second, bad is positive, at times. Complaints about bureaucratization are as merited as the recognition that bureaucratization is one way decisionmakers resist trivializing issues further. Even administration is a kind of fastthinking when compared to some alternatives.

Third, not only is the chief feature of this messiness surprise, the greatest surprise is how many ways the uncertainty, complexity, conflict and incompleteness afford for recasting (redescribing) the so-called intractable. Having many components, multiple differentiation and high interconnectivity has, again, its upsides, not just downsides. Complexity sands away any shield of photo-clarity and reveals the contingent possibilities that have been missed.

6. Complex messiness also implies that some kinds of accidents and errors—including sabotage—are going on that are not noted by anyone, including at times the perpetrators acting unintentionally.

We are already tolerating a level of “mistakes” for which we are not managing, which raises the issue of just what accident level we are already tolerating as part of our “adaptive capacity.” So there may be a subtle truth in what one expert told us, i.e., originally everyone learns high reliability by accident and from accidents. (Note, however, this is not the same as valorizing “trial and error learning.”)

Six take-home messages from recent research on large socio-technical systems

Published by Emery Roe

Leave a comment Cancel reply

Share this:

Related

Published by Emery Roe

Leave a comment Cancel reply