Well, one thing we can say with certainty about ChatGPT is this: It’s not highly reliable.
If high reliability is defined as the safe and continuous provision of a critical service, even during (especially during) time-urgent conditions, then the last thing ChatGPT guarantees is safe and useful information provided when needed under conditions that matter majorly.
The problem isn’t just that it is trained on biased information and inadequate queries. There are at least two other high-reliability deficits due to its black box. First, the theory behind the AI software is, as the Scottish verdict has it, Not Proven:
For large language models (LLMs) like ChatGPT, we’ve gone from around 100 million parameters in 2018 to 500 billion in 2023 with Google’s PaLM model. The theory behind this growth is that models with more parameters should have better performance, even on tasks they were not initially trained on, although this hypothesis remains unproven
https://arstechnica.com/gadgets/2023/04/generative-ai-is-cool-but-lets-not-forget-its-human-and-environmental-costs/
Second, you can’t be sure you’re using the same black box information from query to query:
It means it’s difficult to carry out external evaluations and audits of these models since you can’t even be sure that the underlying model is the same every time you query it. It also means that you can’t do scientific research on them, given that studies must be reproducible.
Op cit
So what if they aren’t reliable?
The promise of course is that they will be highly reliable, once trained on a universal corpus and queried by a universe of questions.
Let’s say for the sake of argument that happens. We are still left with the major problem, however: In the absence of high reliability during the seriatim training, we the various publics serve as the laboratory for this social experimentation.
Now, much has been written on social experimentation. It’s the laboratory side I want to focus on here.
Donald T. Campbell, the experimental psychologist, advocated “the experimenting society” and talked about “the logic of the laboratory.” What he meant was primarily the use of randomized control trials (RTCs). It was only with the work of Latour and Woodgar that we understood much more was involved in the “life of the laboratory” than one or more logics.
What ChatGPT underscores is how much the life of the laboratory has become laboratories as the infrastructure of everyday life. It’s as if policy and management are always rehearsing improv.