Under what conditions? With respect to what?

A young researcher had just written up a case study of traditional irrigation in one of the districts that fell under the Government of Kenya’s Arid and Semi-Arid Lands (ASAL) Programme. (We’re in the early 1980s.) I remember reading his report and getting excited. Here was detailed information about really-existing irrigation practices and constraints sufficient to pinpoint opportunities for improvement. That was, until I turned to the conclusions: What was really needed was a country-wide land reform.

Huh? Where did that come from? Not from the details and findings in the report!

This was my introduction to pre-existing solutions in search of new problems they should “solve.” Only later did I realize I should have asked him, “What kind of land reform with respect to what and under what conditions at your research site?”

Someone asserts that this policy or approach holds broadly, and that triggers your asking:

Under what conditions?
With respect to what?
As opposed to what?
What is this a case of?
What are you–and we–missing?

Under what conditions does what you’re saying actually hold? Risk or uncertainty with respect to what failure scenario? Settler colonialism as opposed to what? Just what is this you are talking about a case of? What are you and I missing that’s right in front us?

Answers that many don’t talk about to “What is infrastructural power?”

It shouldn’t be surprising that the more comprehensive a theory of materialist determinism, the easier to find the exceptions.

It is understandably more common then that the view of bureaucratic and authoritarian governments wielding infrastructural power for their own interests is contrasted to those empirical cases indicating quite otherwise. In these comparisons, the infrastructures are demonstrably weak, in need of great repair and maintenance, perform far less effectively than designed, planned or promised, require massive repurposing in light of the Energy Transition, and where the infrastructures’ real-time operators are barely managing or coping precisely because they don’t have the kind of control that many discussions attribute to “infrastructural power.”

Yes, critical infrastructures—even in their variably existing, heterogeneously performing conditions—still have major bearing on all those material factors taken to be important ranging from income and wealth inequalities and well-being to national and international versions of growth, prosperity and sustainability. But that impact is more differentiated and case-by-case than over-arching theories of materialist determinism allow. The former’s keywords and terms include: unpredictable, inadvertent, unintended, contingent, with many intervening variables.

For example, it’s routine to say that governments have allocative, distributive, regulatory and stabilization functions. In actual fact, infrastructures exercise a different and more variegated form of societal regulation by prioritizing systemwide reliability and safety as social values in real time. This matters for any understanding of “infrastructural power.”

Consider the commonplace that regulatory compliance is “the baseline for risk mitigation in infrastructures.” There is no reason to assume that compliance is the same baseline for, inter alios, the infrastructure’s eyes-and-ears staff on the ground; the infrastructure’s headquarters’ compliance staff responsible for monitoring industry practices for meeting government mandates; the senior officials in the infrastructure who see the need for more enterprise risk management; and, last but never least, the infrastructure’s reliability professionals—its real-time operations personnel, should they exist, and immediate support staff— in the middle of all this, especially in their role of surmounting any stickiness by way of official procedures and protocols undermining real-time system reliability and safety.

Stickiness? Noncompliance may be a regulatory error for the regulator of record; the same noncompliance may be an important option for ensuring system reliability when the task environment indicates the said regulation to be defective. Indeed for real-time operations, noncompliance is not an error, if following that regulation jeopardizes infrastructure reliability and safety now or in the next steps ahead. Indeed, the importance of time from discovery to correction of error reinforces a process of dispersed regulatory functions, where one of the regulatory functions of the infrastructure’s real-time operations is to catch and correct for error by the regulator of record under conditions of mandated reliability.

III

True, governments rely on infrastructures to meet their own functions and, yes, there is an overlap and dependency between both as the case of compliance illustrates. Few, however, think to ask, let alone study, how critical infrastructures—many of which are privately owned or managed in the US—independently and differentially affect society-wide risks, social values and societal regulation. It’s all well and good to stress there are other social values than reliability and safety. But it also useful to remind ourselves that much, if not most, of the world is characterized by unreliable and/or unsafe critical services—notably water and electricity—even where there are infrastructures of sorts providing the services.

So yes of course, operating these infrastructures, reliably or otherwise, create inequalities and exclusions. But wouldn’t you want to know before changing them the likely effects on systemwide reliability and safety of that change, however well-intended? (The chief lesson of Policy Analysis 101 is: The opposite of good are good intentions.) Even low-cost, more sustainable socio-technical systems will be reliable only up to that unpredictable failure ahead they can’t or haven’t prevent. They too will have to manage or cope because they too can’t control the Anthropocene when it comes to an infrastructure’s inputs, processes and outputs.

More, you needn’t be clairvoyant to realize that the Energy Transition–whether in its reformist or radical versions–means a host of second chances for critical infrastructures and their mandated provision of reliable services.

With or without Stop-Oil, infrastructures will remain central to energy provision and interconnectivity; with or without Sustainability, reliability and safety will be demanded across that interconnected provision. Technologies and system configurations will change, but even the keywords of radical versions of the Energy Transition—transformative, emancipatory—are redolent with the promise of second chances along the way.

What makes the second chances so important? For one thing the Climate Emergency portends all manner of illiquidity, not least of which are today’s infrastructures being tomorrow’s stranded assets. But “stranded” underscores the place-based character of the infrastructure. Stranded also implies the possibility of its other use(s), second chances in other words. One has to wonder if current Energy Transition scenarios are granular enough to take them seriously.

Why is increased granularity of scenarios important? Critical infrastructures are themselves importantly differentiated. Some have centralized operation rooms or floors; others do not. Even those with an operations center vary majorly with respect to the reliability of their critical services. In particular, they may well be operating to different standards of reliability, from which follows they are managing for different risks and uncertainties.

For instance, it is true that nuclear explosions occur, dams are overtopped, and grids do separate and island, but these events are rare–rare because of their real-time management beyond the defects of technology and design–and when these events do happen they serve to reinforce a societal dread that they indeed are must-never-happen events. Real-time system operators seek to preclude must-never-happen events like loss of nuclear containment, cryptosporidium contamination of urban water supplies, or jumbo jets dropping from the sky because of that widespread societal dread.[1] (Which of course can change, and not just because of the Anthropocene).

In contrast, financial services have “should-never-happen events”—bank runs should be avoided and financial crises shouldn’t happen. The standard of operating reliability is not one of precluding financial crises from ever happening, but rather of treating these crises (1) as avoidable though not always, or (2) as inevitable (“busts are part of market capitalism”) or at least (3) compensable after the fact (as in the pre-2008 assurance that it’s better to clean up after a financial bubble bursts than trying to manage it beforehand).

So what? Well for one thing, not having highly reliable financial services based on must-never-happen events has major consequences for standards of economic stability and growth (also variously defined). At the macro level, there are two different standards of economic reliability: The retrospective standard holds the economy is performing reliably when there have been no major shocks or disruptions from then to now. The prospective standard holds the economy is reliable only until the next major shock.

Why does the difference matter? In practical terms, the economy is prospectively only as reliable as its critical infrastructures are reliable, right now when it matters for economic productivity (again, broadly writ). Indeed, if economy and productivity were equated only with recognizing and capitalizing on retrospective patterns and trends, economic policymakers and managers could never be reliable prospectively.

By way of example, a retrospective orientation to where we are today is to examine economic and financial patterns and trends since, say, 2008; a prospective standard would be to ensure that–at a minimum–the 2008 financial recovery could be replicated, if not bettered, for the next global financial crisis. The problem with the latter–do no worse in the financial services sector than what happened in the last (2008) crisis–is that benchmark would have to reflect a must-never-happen event going forward. What, though, are the chances it would be the first-ever must-never-happen event among all of that sectors’ should-never-happen ones?

Not only do these reliability standards differ, so too do the risk and uncertainties that follow from managing to the respective standards. The classic case is the one emergency within and across infrastructures infrequently discussed: suicide for fear of death.

What else can we do, senior executives and company boards tell themselves, when business is entirely on the line? In this emergency, we have to risk failure in order to succeed!

But what if the business is in a critical service sector? Here, when upper management seeks to implement risk-taking changes, they rely on real-time reliability professionals, who, when they take risks, only do so in order to reduce the chances of failure. To reliability-seeking professionals in critical infrastructures, the risk-taking activities of their upper management look like a form of suicide for fear of death.

This has become an all-too-common phenomenon. When professionals are compelled to reverse practices they know to be reliable, the results are deadly. Famously in the Challenger accident, engineers had been required up to the day of that flight to show why the shuttle could launch; on that day, the decision rule was reversed to one showing why launch couldn’t take place.

Once it was good bank practice to hold capital as a cushion against unexpected losses; capital security arrangements now mandate they hold capital against losses expected from their high-risk lending. Mortgage brokers traditionally made money on the performance and quality of mortgages they made; in the run-up to the 2008 financial crisis, their compensation changed to one based on the volume of loans originated but passed on.

Originally, the Deepwater Horizon rig had been drilling an exploration well; that status changed when on April 15 2010 BP applied to the U.S. Minerals Management Service (MMS) to convert the site to a production well. The MMS approved by the change. The explosion occurred five days later.

In brief, ample evidence exists that decision rule reversals that required professionals in high-stakes situations to turn inside out the way they managed for reliability have instead led to system failures: NASA was never the same; we are still trying (in 2024!) to get out of the 2008 financial mess and the Great Recession that followed; the MMS disappeared from the face of the earth.

“But, that’s a strawman,” you protest. “Of course, we wouldn’t deliberately push reliability professionals into unstudied conditions in critical support sectors, if we could avoid it.” Really? The oft-recommended approach, Be-Prepared-for-All-Hazards, looks like the counsel of wisdom. It however is dangerous if it flips mandates to requiring organizations to cooperate around new or far more variables, using information they will not have or cannot obtain, for all manner of interconnected scenarios, which if treated with equal seriousness, produce considerable modeling and analytic uncertainties.

Just as risk and uncertainty differ in critical infrastructures (probabilities and consequences of failure are variously known or not), so too reliability and safety are not one and the same. Just because you reduce risk doesn’t mean you thereby improve safety. It is true that risk and safety overlap as terms in ordinary language. Some seek to formalize the purported relationships—e.g., increasing safety barriers reduces risk of component or system failure.

In contrast, I come from a field, policy analysis and management, that treats safety and risk to be very different. Indeed, one of the founders of my profession (Aaron Wildavsky) made a special point to distinguish the two. The reasons are many for not assuming that “reduce risks and you increase safety” or “increase safety and you reduce risks.” In particular:

However it is estimated, risk is generally about a specified harm and its likelihood of occurrence. But safety is increasingly recognized, as it was by an international group of aviation regulators, to be about “more than the absence of risk; it requires specific systemic enablers of safety to be maintained at all times to cope with the known risks, [and] to be well prepared to cope with those risks that are not yet known.”. . .In this sense, risk analysis and risk mitigation do not actually define safety, and even the best and most modern efforts at risk assessment and risk management cannot deliver safety on their own. Psychologically and politically, risk and safety are also different concepts, and this distinction is important to regulatory agencies and the publics they serve. . . .Risk is about loss while safety is about assurance. These are two different states of mind.

C. Danner and P. Schulman (2019). Rethinking risk assessment for public utility safety regulation. Risk Analysis 39(5), 1044-1059

Once again, the differences come with the failure scenarios—risks with respect to this failure scenario’s set of granularities as distinct from safety with respect to a different set of granularities or even a different failure scenario altogether.

VII

That failure scenarios do differ is nowhere better demonstrated than in the fact that there are different fields of infrastructure studies. For a world where bureaucratic and authoritarian states exert infrastructural power to further their own interests—well, that is the failure of concern. But there are other schools of infrastructure studies. Here I focus on what a socio-cultural perspective has to say about infrastructure repair that a socio-technical perspective might wish to pursue further. Since my work is from the socio-technical perspective, it’s only fair that I not try to summarize positions from a socio-cultural perspective but quote from their work directly:

For all of their impressive heaviness, infrastructures are, at the end of the day, often remarkably light and fragile creatures—one or two missed inspections, suspect data points, or broken connectors from disaster. That spectacular failure is not continually engulfing the systems around us is a function of repair: the ongoing work by which “order and meaning in complex sociotechnical systems are maintained and transformed, human value is preserved and extended, and the complicated work of fitting to the varied circumstances of organizations, systems, and lives is accomplished” . . . .

It reminds us of the extent to which infrastructures are earned and re-earned on an ongoing, often daily, basis. It also reminds us (modernist obsessions notwithstanding) that staying power, and not just change, demands explanation. Even if we ignore this fact and the work that it indexes when we talk about infrastructure, the work nonetheless goes on. Where it does not, the ineluctable pull of decay and decline sets in and infrastructures enter the long or short spiral into entropy that—if untended—is their natural fate.

S. Jackson (2015) Repair. Theorizing the contemporary: The infrastructure toolbox. Cultural Anthropology website. Available at: https://culanth.org/fieldsights/repair (accessed 24 September 2015)

The nod to “sociotechnical systems” is welcome as is the recognition that these systems have to be managed–a great part of which is repair and maintenance–in order to operate. Added to routine and non-routine maintenance and repair are the just-in-time or just-for-now workarounds (software and hardware) that are necessitated by inevitable technology, design and regulatory glitches–inevitable because comprehensiveness in analysis and operations is impossible to achieve in complex large-scale systems.

For its part, socio-technical research on infrastructures calls into question any assumption that macro-designs control every important micro-operation, an assumption also very much questioned in this socio-cultural perspective, e.g., “approaching infrastructure from the standpoint of repair highlights actors, sites, and moments that have been absented or silenced by stories of design and origination, whether critical or heroic.” Here the test of efficacy isn’t ‘‘Have we designed a system that can be controlled?,’’ but rather ‘‘Is this a system we can manage to redesign as needed?’’

Also from the socio-technical perspective, the “end of infrastructure operations” isn’t decay, decline or entropy from a socio-cultural perspective as much as system failure and immediate emergency response, including seeking to restore, as quickly as possible, even if temporarily, water, electricity and telecoms to survivors. What to my knowledge has not been pursued in the socio-technical literature is the following from a socio-cultural focus on repair:

Attending to repair can also change how we approach questions of value and valuation as it pertains to the infrastructures around us. Repair reminds us that the loop between infrastructure, value, and meaning is never fully closed at points of design, but represents an ongoing and sometimes fragile accomplishment. While artifacts surely have politics (or can), those politics are rarely frozen at the moment of design, instead unfolding across the lifespan of the infrastructure in question: completed, tweaked, and sometimes transformed through repair. Thus, if there are values in design there are also values in repair—and good ethical and political reasons to attend not only to the birth of infrastructures, but also to their care and feeding over time.

That the values expressed through repair (we would say, expressed as the practices of actual repair) need to be understood as thoroughly as actual design reflects, I believe, a major research gap in the socio-technical literature with which I am familiar (the latter being much more concerned with the gap between designs-to-control and practices-for-managing/coping). Finally, I cannot over-stress the importance of infrastructure fragility, contrary to any sturdy-monolith imaginary of infrastructural power one might have gotten from elsewhere.

[1] Not only is societal dread important, but so is operator distrust. One reason infrastructure operators manage reliably is that they actively distrust the future will be stable or predictable in the absence of the system’s vigilant real-time management. We of course must wonder at the perversity of this. But that is the function of this dread and distrust. Namely: to push all of us in probing further what it means to privilege reliability and safety over other societal values. We are meant to ask: What would it look like in world where such reliability and safety are not so privileged? For the answer to that question is again obvious: Most of the planet already lives in that world of unreliability and little safety. We’re meant to ask, precisely because that answer is that clear.

The “future” in HRO Studies: the example of networked reliability as a form of reliability seeking

Introduction

One central insight from the literature and theory of High Reliability Organizations (HROs) is that past reliable performance of large socio-technical systems does not predict, let alone ensure, reliable performance into the future. In this sense, then, there are two futures of interest in this blog entry: the future of large socio-technical systems from the perspective of HRO Studies and the future of HRO Studies.

It’s to be recognized from the get-go that “HRO Studies” now houses many reliability-seeking rooms. A major path trajectory for research and taxonomies has been the early work of Todd LaPorte, Gene Rochlin, Karlene Roberts and Paul Schulman around a set of hazardous organizations mandated to maintain reliable (safe and continuous) operations (LaPorte and Consolini 1991; Roberts 1993; Rochlin 1993; Schulman 1993). We need look no further for the durability of this trajectory than the many comparisons of HRO/HRT (high reliability theory) to Charles Perrow’s Normal Accidents Theory (more recently, Bamburger 2014 for cybersecurity; Min and Borch 2021 for financial markets).

Yet, it is also true that HRO Studies has been extended and changed in ways initially unforeseen with contributions to, inter alia, management practice (e.g., Weick and Sutcliffe 2001), safety science (e.g., Hopkins 2014; Amalberti 2013), resilience theory and practice (e.g., Hollnagel, Woods, and Leveson 2006; Boin and van Eeten 2013), networked reliability (e.g., de Bruijne 2006; Roe and Schulman 2016), and analyses of high reliability as a continuous quantifiable variable in the operations of health care, nuclear power, and other industries (Vogus and Sutcliffe 2007; Schöbel 2009; May 2013; O’Neil and Kriz 2013).

In additiion, other fields, like development studies, have also drawn water from this wider trough (Scoones 2024). Most notably for our purposes here, this diversification of HRO Studies has witnessed a shift from identifying HRO properties or features at a point in time to identifying processes of high reliability organizing and managing over time (e.g., Ramanujam and Roberts 2018). Another piece of evidence for the continuing relevance of HROs, now writ large, is the Google Ngram for the term “high reliability organizations” shows a steady climb in the literature.

That said, as there is something perverse in assuming past trends and growth are a reliable guide for the future of and in HRO Studies, how then to see better what’s ahead for the field and what’s ahead by way of large socio-technical systems?

Specific aim

My answer is to return to another, more methodological insight of LaPorte, Rochlin, Roberts and Schulman: The specific cases analyzed matter profoundly. In their set, they found reliable management at that point in time where theory would still tell you not to expect it.

But, of course, there were other reliability-seeking cases and unsurprisingly some of these did not and still do not exhibit HRO processes, let alone features. That indeed is the lesson I take away from the many published “HRO v NAT” comparisons: not that one theory is better “overall” than the other, but that there is still no substitute for attending to differences in the cases examined, in time and over time–especially if theory is to matter for practice.

Focus, rationale and roadmap

I want to suggest then that HRO Studies might benefit at this juncture by a comparative and longitudinal analysis involving similar or related cases. I am familiar with only one, that starting with the California Independent System Operator (CAISO), which operates most of the state’s electric transmission grid.

Here I focus on the aforementioned “network reliability” strand of HRO Studies and ask: What have we learned since Mark de Bruijne published his transmission grid study of CAISO in 2006? I start with this work for three reasons: His book preceded our own 2008 work on managing networked high reliability in electricigty transmission; we aligned subsequent work with De Bruijne and others (Roe and Schulman 2016: 10); and I want to be very transparent upfront that I am not cherry-picking quotes from our own work to substantiate the implications drawn at the end of this blog entry.

Both Paul Schulman and I continued the CAISO research and analysis after our Dutch colleagues left, the results of which were in the form of updates and framework extensions in Roe and Schulman (2008, 2016). More recently, Paul Schulman and I have investigated the notion of network reliability across interconnected critical infrastructures, including but not limited to electricity (Roe and Schulman 2018, 2023).

In what follows, I first present De Bruijne’s findings. I then update this earlier notion of networked reliability in light of our research to the present on interconnected critical infrastructures, including that of electricity. The conclusion focuses on what I take to be important implications for the network reliability, both as a strand in HRO Studies and in the future(s) of large socio-technical systems. To telegraph ahead and in T.S. Eliot’s words, “the end of all our exploring will be to arrive where we started and know the place for the first time.”

Summary of Networked Reliability (de Bruijne 2006)

The full text of Mark de Bruijne’s Networked Reliability is well worth reviewing and can be accessed at https://www.researchgate.net/publication/306011428_Networked_Reliability_Institutional_fragmentation_and_the_reliability_of_service_provision_in_critical_infrastructures.

For those who do not have the time, the work is summarized in a 2007 article De Bruijne co-authored with his dissertation advisor and our early CAISO research colleague, Michel van Eeten, “Systems that Should Have Failed: Critical Infrastructure Protection in an Institutionally Fragmented Environment” (accessed at https://www.researchgate.net/publication/227701135_Systems_that_Should_Have_Failed_Critical_Infrastructure_Protection_in_an_Institutionally_Fragmented_Environment)

I quote at length from De Bruijne and Van Eeten in order to establish for later purposes of comparison a separate and uninterrupted benchmark for the networked reliability then under study in the early 2000’s:

A key question that arises from these developments is: how do CI [critical infrastructure] industries, consisting of networks of organizations, many with competing goals and interests, provide reliable services in the absence of conventional forms of command and control? This raises another question that, logically, precedes it: are institutionally fragmented CIs in fact still reliable?

Does Institutional Fragmentation Affect the Reliability of Service Provision?

The exact relationship between institutional restructuring and the reliability of services and networks has so far remained largely obscured. The available empirical data on reliability – measured in terms of the frequency and length of disruptions to end-users – fail to provide an unequivocal answer. We were able, however, to draw upon extensive field research on reliability-related issues in large-scale water systems (Van Eeten and Roe, 2002; Roe and Van Eeten, 2002), electricity grids (Schulman et al, 2004; Roe et al, 2005) and telecommunication networks (Van Eeten et al, 2005; De Bruijne, 2006). Together, these field studies comprise over 130 interviews, extensive control room observations and literature reviews.

Without repeating previous discussions of our findings, we can draw out a number of implications, primarily based on our studies in electricity and telecommunications. First of all, while there are no conclusive data regarding the reliability of services and networks post-restructuring [post-deregulation, privatization and liberalization], the data that is available suggests that the network operators and service providers have managed to cope with these changes. The two focal organizations that we studied – the California Independent System Operator (ISO) and Dutch mobile telephony operator KPN Mobile – succeeded in maintaining a high reliability of service provision. The organizations displayed virtually unchanged levels of service provision before and after restructuring. The ISO’s reliability performance during California’s electricity crisis in 2000 and 2001 – one of the most turbulent periods in which any restructured critical infrastructure industry ever operated – did not lead to outage rates that differed significantly from those of the utilities before restructuring. In the end, the lights stayed on for most of the time, notwithstanding the popular images in the media of sweeping blackouts across. The reported rolling blackouts occurred on eight days for 27 hours, compared to the 125 days on which just 1.5 percent of operating reserves remained and stage 3 emergencies were declared. The aggregate amount of load shed during California’s electricity blackouts was quite small, adding up to no more than one hour’s worth of electricity to all residential homes in the state. This performance fell within the margins of the average annual reliability performance of the investor-owned utilities before restructuring. However, other key reliability indicators (e.g. the number of high-voltage transmission line overloads and the number of violations in the ISO’s control area) did show that the system was operating closer to the edge of failure – demonstrating the massive pressure under which the system was operating. In other words, although negative effects of restructuring could be identified, the organizations involved, most notable the ISO, managed to cope with these effects and maintain acceptable levels of service and network reliability.

Similarly, the Dutch mobile telephone operator KPN Mobile displayed a steady reliability performance from 1996 to 2001, notwithstanding seven-fold increase in customers, the rapid expansion and innovation of its mobile network and the six-fold increase in the number of services it provided over this network. From 1996 to 2001, the company displayed steadily rising call completion rates (CCR) and call setup success rates (CSSR), which in the telecommunication industry are considered key proxies for the reliability of service provision. In addition to these steadily improving reliability indicators, KPN experienced ‘only’ a 50 percent increase in the number of ‘calamities’ – which they define as incidents with an impact on customers.

While significant, this number pales in comparison to the growth rate of customers, network and services. KPN Mobile achieved this performance under cut-throat competition in the market which forced them to undertake drastic cost reductions in their operations.

Considering the effects of institutional fragmentation on how these CIs were organized and operated, the abovementioned performance of both the ISO and KPN Mobile may be considered an astonishing feat. Despite operating under conditions with significantly reduced resources time and again the organizations managed to maintain a reliable provision of CI services. These findings are all the more puzzling since the two dominant organizational theories that are used to assess the reliability, or lack thereof, of complex, large-scale technological systems would predict a negative impact on the ability of organizations to reliably manage these CIs.

The Normal Accident Theory (NAT) (Perrow, 1999a) and High-Reliability Theory (HRT) (Roberts, 1993) both expect that institutional fragmentation caused by restructuring negatively affects the ability to reliably manage these infrastructures and that reliability of service provision accordingly should have suffered. However, the case studies did not confirm the theoretically assumed negative relationship between the effects of institutional fragmentation and ability to reliably manage these infrastructures even though infrastructure operations did become more complex to manage and behaved more volatile (De Bruijne et al., 2006). Evidence did show that the infrastructures operated ‘closer to the edge’ than before restructuring. So how can we explain the performance record of restructured CIs and the more or less continued high reliability of the provided services in the researched cases?

Coping with institutional fragmentation

Based on these findings, it could be concluded that institutional fragmentation and restructuring not only negatively affected the ability of organizations that manage CIs to provide highly reliable services, but also offered new options that enabled organizations involved in the management of these systems to maintain reliability under extremely demanding conditions. The case studies revealed a large number of hitherto unknown or unrecognized conditions that enabled these organizations to cope with the effects of institutional fragmentation (De Bruijne, 2006). Examples include the increased use of real- time, on-line experimenting; the gradual redefinition of reliability norms and criteria to fit the new conditions and the increased use of support staff and informal wheeling and dealing in real-time in control rooms. These conditions, which many at first glance would consider detrimental to the provision of reliable services, were found to contribute to the ability of the organizations to maintain a reliable provision of services.

The research found both NAT and HRT flawed in their assumptions on the main relationships between the conditions that facilitate reliability and the levels of reliability achieved. The networked environment clearly emphasized different reliability-enhancing characteristics than those identified by NAT and HRT (cf. Grabowski & Roberts, 1996; Schulman et al., 2004). The implication is that NAT and HRT, which until now have been presented as generic organizational theories of (un)reliability, need to be modified in order to be valid under conditions of networked reliability (see also Schulman et al, 2004; De Bruijne, 2006). In general terms, we have identified three shifts of emphasis in organizational processes and resource allocation.

(i) From long-term planning to real-time management

Institutional fragmentation and the introduction of competition create more volatile and technologically more complex infrastructures. Many of the procedures and routines that had been designed to reliably operate the CIs do not function anymore. Infrastructure operations used to emphasize the importance of complete information, centralized planning and command and control. Institutional fragmentation caused those in control of infrastructure operations to be confronted with less than adequate information and control, leading to more surprises and reliability-threatening events. This in turn emphasizes a need for more flexible response capability to maintain reliable services. Real-time operations – typically focused in and around control centers – increases in importance, reducing the strong reliance on long-term, detailed planning that has characterized CIs (cf. De Bruijne et al., 2006; Van Eeten et al., 2006; Roe et al., 2002).

(ii) From design and analysis to improvisation and experience

More volatility and complexity also means more unpredictability. As Demchak (1991, p. 3) has said, the chief manifestation of complexity is surprise. Operations move more often ‘outside analysis’, beyond the well-studied situations for which technology has been designed and procedures have been tested. Under these circumstances, relying on established procedures, routines and guidelines decreases rather than ensures reliability. In real-time, control room operators increasingly have to rely on their experience and improvisational abilities to deal with surprises and volatile events. Referential knowledge, improvisation, ‘instinct’ and experience gain precedence in comparison to detailed procedures and routines. It becomes more important to train operators to know when not to follow procedure and how to still maintain reliability.

(iii) From standardized and formal to real-time informal communication and coordination

The third shift moves infrastructure operations away from formal and hierarchical towards informal and ‘rich’ modes of communication and coordination. To put it differently: real-time resists formalization. Faced with surprises and threatening events, CI operations are constrained by hierarchical, unilateral, and formal modes of communication and coordination; albeit legacies from the pre-restructuring days or those installed after restructuring to ensure competition and level playing fields. Both types severely handicap operators’ abilities to improvise and provide reliable services. Especially when faced with reliability-threatening events, informal communication and coordination mechanisms take over or augment formal mechanisms. The need for real-time communication has already been identified in the literature on coordination in networks of organizations as well. In the absence of formal communication and coordination arrangements between organizations in networks, informal coordination and communication evolve and take over (cf. Chisholm, 1989). Powell (1990:304) finds information passed through networks (of organizations) must be “thicker” than information obtained through markets and “freer” than information communicated through hierarchies.

Real-time, ‘rich’ informal communication and coordination has been identified as one of the most important sources of networked reliability: “[R]eal-time values and privileges the non-routine over the routine, the informal over the formal, and the relational over the representational” (Roe et al., 2002:9-5). In other words, the ability of system operators to engage in a rich exchange of information and informal deals enhances their knowledge of system conditions, stimulates creativity and increases their options for maintaining reliability. To be sure, under ‘normal’ operating regimes, the need for ‘rich’ and varied communication and coordination is constrained by the competitive environment in which CIs nowadays operate. However, when threats occur and move towards real-time, ‘rich’ and informal communication and coordination become increasingly important. Real-time informal infrastructure operations enable types of interventions and control that are typically unacceptable at any other time or place.
(endnotes deleted for readability; citations kept in order to date the findings in the quoted text)

How this picture has changed to the present

As Paul Schulman and I were never invited back to CAISO after 2008, nothing in what follows can be interpreted as remarks about that grid transmission manager today. It should, however, be noted that we did find further evidence of CAISO moving closer to the edge of reliability performance with the introduction of new systemwide marketing software (Roe and Schulman 2016).

Rather than CAISO specifically, what is of interest here is a comparison and update of the notion of networked reliability scanned in the above quote and updated in our subsequent work (most recently, Roe and Schulman 2023).

For me, the most striking contrast is this: Some of us in this networked reliability strand of HRO Studies are having to spend considerable time on parsing out the features and processes of the interconnectivities between and among the networked critical infrastructures.

Stay with electricity as the example. The network of primary interest is no longer the one connecting the then-fragmented, deregulated units for generation, transmission and distribution of the once integrated energy utilities. Today’s electricity network of interest revolves how it is interconnected with other “lifeline” infrastructures, not least of which are the large socio-technical systems for water, telecommunications and transportation.

Further, the configurations of these interconnections are far more varied than originally studied for maintaining the continuous provision of a critical service, even during (especially during) turbulent periods. Serial dependencies and reciprocal interdependencies are matched by pooled and mediated interconnectivities in a wide variety of permutations and combinations. Empirically, many more versions of Interconnected Critical Infrastructure Systems (ICISs) can be identified and demonstrated than even system modelers acknowledge to date (Roe and Schulman 2016).

One of the ironies of having this now-wider understanding of network interconnectivities is that the picture has become more granular and detailed for purposes of operations and management than was the case for in describing the restructured utilities. The centrality of human ingenuity under urgent circumstances moves beyond the control room and into the field in periods of system disruption, failure, immediate emergency response and initial service restoration. “Rich” informal communications and coordination take place between different infrastructure staff when their respective system control variables overlap or are shared (e.g., the railroad bridge over a major shipping navigation way becomes stuck). In fact, there are cases where improvisations undertaken together by the different infrastructures, field staff and/or control rooms, are the real-time interconnectivities that matter for the respective operations.

Initial implications

Just as the extended quote of De Bruijne and Van Eeten is date-stamped by the then very live issue of energy deregulation, so my preceding update will be seen as date-stamped by what is today’s headline issue of emergency management in a world of interconnected critical infrastructure faced by all manner of crises.

But such dating is not the problem here. What remains a problem is that finding in De Bruijne and Van Eeten: “The research found both NAT and HRT flawed in their assumptions on the main relationships between the conditions that facilitate reliability and the levels of reliability achieved. The networked environment clearly emphasized different reliability-enhancing characteristics than those identified by NAT and HRT.” We–and I include myself here–are still being astonished by higher levels of reliability performance than current theories and expectations would expect us to believe. Aspirational high reliability seems to be transformed into high reliability management at least in some cases and without any guarantees for doing so in the future. To say this can’t go on forever is hardly the point; rather: How is this still happening?

How did CAISO survive the introduction of its disruptive the then-new system marketing software that we studied? How has China’s high-speed rail system been as reliable as it has been, given its massive size and scale? Does the capacity to achieve reliable normal operations in digital platforms–not by precluding or avoiding certain events but by adapting to electronic component and subsystem failure most anywhere and most all of the time–offer a very different skill-set for “reliability management” in other digitized critical infrastructures? Are there in fact more “control rooms” and “reliability professionals” out there than those of us who study them acknowledge?

Note in asking these questions I am reproducing the same level of astonishment and question-asking that motivated the earliest HRO researchers with respect to the systems they studied. If so, studies of networked reliability–and HRO Studies as a whole?–have always had a future in search of more answers.

References [to be provided]

The importance for 2026 IYRP of international legal change with respect to pastoralists worldwide

International legal change is an affair of societal and institutional practices about and around legal norms. Observe these practices, the social facts and not just the texts,. . .and you will see the real dynamics of international legal change. Nico Krisch recently took the matter to heart, and identified five paths of change in international law through social facts: the state action path (‘when states modify their behaviour and make corresponding statements’); the multilateral path (when ‘change is generated as a result of statements issued by many states within the framework of an international organization’); the bureaucratic path (through ‘decisions or statements produced by international organizations in contexts that do not involve the direct participation of states in the decision- making process’); the judicial path (change through ‘decisions and findings of courts and quasi-judicial bodies’); and the private authority path (where ‘change follows statements or reports by recognized authorities in a private capacity without a clear affiliation to or mandate from states or international organizations’, typically taking the form of ‘the production of technical manuals, standards, and regulations’).

https://www.cambridge.org/core/journals/leiden-journal-of-international-law/article/international-law-in-the-minds-on-the-ideational-basis-of-the-making-the-changing-and-the-unmaking-of-international-law/F7CE42451E97CCF68A87239E6E3485CF

If indeed multiple pathways are required to assess international legal change(s) with respect to pastoralists worldwide, then that topic, “international legal change in pastoralism,” is one ripe for study and action.

Some comparative studies of pastoralists across regions or in terms of World Bank, IMF and other IO programs, along with fewer comparisons of policy and management differences between relevant International NGOs, to which we can add some analyses of cross-country court cases and of international regulations governing the many aspects of livestock production and export hardly constitute a coherent foundation for describing the relevant international legal changes.

I’m just as guilty as others in habitually collapsing the legal under the rubric of “policy and management.” In advance of the 2026 International Year of Rangelands and Pastoralists. however, we are better advised to take greater care in separating the legal out from the rest in the next steps ahead. This is especially true, I believe, where pastoralist systems are a dominant infrastructure for generating options variety in the face of high uncertainty and complexity (the legal becomes much more obvious and relevant in infrastructure studies).

The neoliberal status quo

Consider “the unimaginability of any alternative to the neoliberal status quo.” Surely that’s a glove pulled inside-out. Neoliberalism generates such contingency and uncertainty as to undermine any status quo. It’s the status quo that is unimaginable.

Then again, have status quo’s ever been in practice as they are in theory? To paraphrase the international relations theorist, Hans Morgenthau: Excuse me, but just what status quo have the people committed themselves to? They haven’t, irrespective of what systems are said to do by virtue of their own structures. In situations where indefinite recovery is the new normal, what does the status quo ante even mean today?

What if. . .

. . .we knew the murderer in Edwin Drood because Dickens did actually tell his illustrator: “I must have the double necktie! It is necessary, for Jasper strangles Edwin Drood with it”;

. . .Henri Bergson were the sole example of a philosopher having an unprecedented impact on everyday life, as he’s said to have caused the first Broadway traffic jam in New York City;

. . .Shakespeare should be criticized because he failed to mention that poor people, not just kings, have trouble sleeping (Henry IV, Part 2, act III, scene 1);

. . .the 175 – 200 million workers in China’s factories, mines and construction industry weren’t the world’s largest proletariat;

. . .the only genuine political project were setting tax rates on the rich and emergency management were primarily a matter of “it can’t happen here’; and

. . .”don’t give a man a fish, but teach him how to fish” is now: If one has to fish, ensure the ecosystem bounces back nevertheless.

A reliability perspective on human rights

When someone asserts that each person has the same human rights as every other person, this move goes from a macro-design principle directly to micro operations of personal behavior. Those making this leap of faith are then upset when macro principles—such as those in the United Nations’ International Covenant on Economic, Social, and Cultural Rights—are qualified by all manner of country-specific protocols and reservations.

But such reservations are not hypocritical. Rather, they must be expected if human rights are to be treated reliably. It has been left up to nation-states to enforce universalized values, and the only way we really know that human rights as macro principles are taken seriously is to see how they are applied through context-specific scenarios, contingent to each country when not to each case.

‘‘Thou shall not kill’’ is all well and good, but we do not know how seriously that principle is treated until we get to grappling with qualifications such as ‘‘except in cases of self-defense.’’ ‘‘Granted that I should love my neighbour,’’ wrote R. H. Tawney, the British economic historian, but ‘‘the questions which, under modern conditions of large-scale organization, remain for solution are, ‘Who precisely is my neighbour?’ and, ‘How exactly am I to make my love for him effective in practice?’’’.

If human rights exist only at the macro level, you counter, are we not all at risk as individuals at the micro level? Yes, but not in the way you may mean. Just because we doubt that human rights actually exist as overarching principles everywhere equally for everyone does not stop us from recognizing that we are at risk in terms of personal and system reliability when systems behave as if those rights did not exist, and that there may be better practices to deal with such situations that are modifiable to the context in which we find ourselves, here and now rather than then and there.

Presentation on major policy & management implications of pastoralism-as-infrastructure

Roe_Pastoralism_WAU_Nov12 Download

“Therefore, we argue that any attempt at reforming AI from within the same interlocking oppressive systems that created it is doomed to failure. . .”

Therefore, we argue that any attempt at reforming AI from within the same interlocking oppressive systems that created it is doomed to failure and, moreover, risks exacerbating existing harm. Instead, to advance justice, we must radically transform not just the technology itself, but our ideas about it, and develop it from the bottom up, from the perspectives of those who stand the most risk of being harmed.
https://journals.sagepub.com/doi/10.1177/20539517231219241

About those “risks”. . . Are you quite sure you want to define radical transformation in conventional terms of “risk reduction”?

A timely reminder now that the Nobel prizes are being awarded: One Nobel economist’s mea culpa

I am much more skeptical of the benefits of free trade to American workers and am even skeptical of the claim, which I and others have made in the past, that globalization was responsible for the vast reduction in global poverty over the past 30 years. I also no longer defend the idea that the harm done to working Americans by globalization was a reasonable price to pay for global poverty reduction because workers in America are so much better off than the global poor. I believe that the reduction in poverty in India had little to do with world trade. And poverty reduction in China could have happened with less damage to workers in rich countries if Chinese policies caused it to save less of its national income, allowing more of its manufacturing growth to be absorbed at home. I had also seriously underthought my ethical judgments about trade-offs between domestic and foreign workers. We certainly have a duty to aid those in distress, but we have additional obligations to our fellow citizens that we do not have to others.

I used to subscribe to the near consensus among economists that immigration to the United States was a good thing, with great benefits to the migrants and little or no cost to domestic low-skilled workers. I no longer think so. Economists’ beliefs are not unanimous on this but are shaped by econometric designs that may be credible but often rest on short-term outcomes. Longer-term analysis over the past century and a half tells a different story. Inequality was high when America was open, was much lower when the borders were closed, and rose again post Hart-Celler (the Immigration and Nationality Act of 1965) as the fraction of foreign-born people rose back to its levels in the Gilded Age. It has also been plausibly argued that the Great Migration of millions of African Americans from the rural South to the factories in the North would not have happened if factory owners had been able to hire the European migrants they preferred.
Angus Deaton, March 12, 2024 (accessed online at https://www.chronicle.com/article/in-economics-do-we-know-what-were-doing?sra=true)