I
Proposition. The expression, M&R (maintenance and repair), signals an already-established state/stage of infrastructure operations (e.g., “operations and maintenance” or “maintenance and repair,” for which there are official and unofficial procedures, routines and protocols).
M&R is a widely accepted, indeed formal, stage of infrastructure operations, and as such deserving of scholarly study with respect to enhancing the resilience of critical infrastructures. Indeed, M&R provides an officially-recognized period for and expectations about identifying and updating what are precursors to system disruption and failure and their prevention/avoidance strategies. Recurrent M&R is all about continuous building in of precursor resilience (e.g., using M&R for identifying obsolescent and now possibly hazardous software or other components). M&R moves closer to the center-stage of infrastructure operations, if only because of the common perception about infrastructures, i.e., “they’re invisible until they break, right?”
II
Implications. Start at the macro-level but with more granularity than conventionally assumed. A form of societal regulation occurs when critical infrastructures, like energy and water, prioritize systemwide reliability and safety as social values in real time. For our purposes here, these values are further differentiated and uniquely so within infrastructures.
Consider the commonplace that regulatory compliance is “the baseline for risk mitigation in infrastructures.” There is no reason to assume that compliance is the same baseline for, inter alios, the infrastructure’s micro-operators on the ground, including the eyes-and-ears field staff; the infrastructure’s headquarters’ compliance staff responsible for monitoring industry practices for meeting government mandates; the senior officials in the infrastructure who see the need for more and better enterprise risk management; and, last but never least, the infrastructure’s reliability professionals—its real-time control room operators, should they exist, and immediate support staff— in the middle of all this, especially in their role of surmounting any stickiness by way of official procedures and protocols undermining real-time system reliability.
To put it another way, where reliable infrastructures matter to a society, it must be expected that the social values reflected through these infrastructures differ by staff and their duties/responsibilities (e.g., responsibilities of control room operators necessarily go beyond their official duties). This in turn also holds for the the operational stage, “maintenance and repair.”
So what?
III
The above implies that M&R provides for increasing precursor resilience, which is best seen now as a differentiated process (resilience will look very different from the intra-infrastructural perspectives of enterprise risk management and real-time control room operations) and which takes place within a wider framework of social regulation not associated solely with the official regulator of record.
Note that infrastructures do convey and instantiate social values, but these values—particularly for systemwide reliability and safety—are not the command and control typically discussed in “infrastructure power”. In the latter, formal design is the starting point for eventual operations; in the former actual operations are the informal starting point for real-time redesign. Not only do actual implementation and operations fall short of initial designs, one major function of operations is to redesign in real time what are the inevitably incomplete or defective technologies of infrastructure designers and defective regulations of the regulator of record.
In this way, it’s better to see “maintenance and repair” as part and parcel of normal operations that follow from and modify formal infrastructure design. M&R’s focus on improving precursor resilience becomes one way of maintaining the infrastructure’s process reliability when older forms of high reliability are no longer to be achieved because of inter-infrastructural dependencies and vulnerabilities.
IV
These distinctions have major implications for reinterpreting “infrastructure resilience.” For example, noncompliance by an infrastructure’s control room may be a regulatory error for the regulator of record; the same noncompliance may reflect a (more) resilient infrastructure (or at least a more resilient control room if there) able to ensure system reliability when the task environment indicates the said regulation to be defective.
In fact for real-time operations, noncompliance is not an error, if following that regulation jeopardizes infrastructure reliability and safety now or in planning the next steps ahead. So too in the case of defective technologies. To put it another way, the criticality of time from discovery to correction of error reinforces a process of dispersed regulatory functions, where one of the regulatory functions of the infrastructure’s real-time operations is to catch and correct for error by the regulator of record and/or design errors by engineers under conditions of mandated reliability. In fact, the latter catching and correcting error are part and parcel of what we mean by a resilient infrastructure and its control room.
V
Finally, the M&R perspective presented here can help us rethink the formal design and planning processes for creating new infrastructures or majorly repurposing existing ones after a major emergency. As Paul Schulman argues “adaptive capacity [for emergency management] can be facilitated in part by planning and design processes that themselves create prior conditions, such as contacts among diversely skilled people in other infrastructures, robust communications systems and contingent resources in different locations, for restoration actions.” I interpret the passage to mean that the mentioned design and planning interventions pass the ‘‘reliability matters’’ test.
That is, the aim of maintaining or enhancing contact lists, communication systems and distributed inventories is to reduce the task volatility that emergency managers and infrastructure operators face, increase their options to respond more effectively, and/or enhance their maneuverability in responding to different, often unpredictable or uncontrollable, performance conditions. That is what we mean by resilience in aid of system reliability. (In case it needs saying, not all design and planning pass the test!)
My thanks to Paul Schulman and Antti Silvast for thinking through some points. Any errors that remain are due to my stubborness. Some material has appeared in earlier blog entries.