One of the joys of pandemic academic life has been a true feast of online events to attend, on a wide variety of topics, some of which are delightfully narrow and esoteric. Case in point was today’s Reflecting on Power and AI: The Case of GPT-3 which lived up to its title. I’ll try to keep an eye out for when the video posts, and update here.
The workshop was largely organized around an exploration of whether GPT-3, the largest known machine learning language model, changes anything for media studies theory, or if it amounts to just more of the same. So the discussion wasn’t focused so much on what games could be played with GPT-3, but rather if GPT-3 changes the rules of the game for media theory, at all. I’m not sure there was a conclusive answer at the end, but it sounded like the consensus was that current theorization around media is adequate for understanding GPT-3, but it matters greatly what theory or theories are deployed. The online discussion after the presentations indicated that attendees didn’t see this as merely a theoretical issue, but one that has direct social and political impacts on our lives.
James Steinhoff looked at GPT-3 using a Marxist media theory perspective where he told the story of GPT-3’s as a project of OpenAI and as a project of capital. OpenAI started with much fanfare in 2015 as a non-profit initiative where the technology, algorithms and models developed would would be kept openly licensed and freely available so that the world could understand the benefits and risks of AI technology. Steinhoff described how in 2019 the project’s needs for capital (compute power and staff) transitioned it from a non-profit into a capped-profit company, which is now owned, or at least controlled, by Microsoft.
The code for generating the model as well as the model itself are gated behind a token driven Web API run my Microsoft. You can get on a waiting list to use it, but apparently a lot of people have been waiting a while, so … Being a Microsoft employee probably helps. I grabbed a screenshot of the pricing page that Steinhoff shared during his presentation:
I’d be interested to hear more about how these tokens operate. Are they per-request, or are they measured according something else? I googled around a bit during the presentation to try to find some documentation for the Web API, and came up empty handed. I did find Shreya Shankar’s gpt3-sandbox project for interacting with the API in your browser (mostly for iteratively crafting text input in order to generate desired output). It depends on the openai Python package created by OpenAI themselves. The docs for openai then point at a page on the openai.com website which is behind a login. You can create an account, but you need to be pre-approved (made it through the waitlist) to be able to see the docs. There’s probably some sense that can be made from examining the python client though.
All of the presentations in some form or another touched on the 175 billion parameters that were used to generate the model. But the API to the model doesn’t have that many parameters. It allows you to enter text and get text back. But the API surface that the GPT-3 service provides could be interesting to examine a bit more closely, especially to track how it changes over time. In terms of how this model mediates knowledge and understanding it’ll be important watch.
Steinhoff’s message seemed to be that, despite the best of intentions, GPT-3 functions in the service of very large corporations with very particular interests. One dimension that he didn’t explore perhaps because of time, is how the GPT-3 model itself is fed massive amounts of content from the web, or the commons. Indeed 60% of the data came from the CommonCrawl project.
GPT-3 is an example of an extraction project that has been underway at large Internet companies for some time. I think the critique of these corporations has often been confined to seeing them in terms of surveillance capitalism rather than in terms of raw resource extraction, or the primitive accumulation of capital. The behavioral indicators of who clicked on what are certainly valuable, but GPT-3 and sister projects like CommonCrawl shows just the accumulation of data with modest amounts of metadata can be extremely valuable. This discussion really hit home for me since I’ve been working with Jess Ogden and Shawn Walker using CommonCrawl as a dataset for talking about the use of web archives, while also reflecting on the use of web archives as data. CommonCrawl provides a unique glimpse into some of the data operations that are at work in the accumulation of web archives. I worry that the window is closing and the CommonCrawl itself will be absorbed into Microsoft.
Following Steinhoff Olya Kudina and Bas de Boer jointly presented some compelling thoughts about how its important to understand GPT-3 in terms of sociotechnical theory, using ideas drawn from Foucault and Arendt. I actually want to watch their presentation again because it followed a very specific path that I can’t do justice to here. But their main argument seemed to be that GPT-3 is an expression of power and that where there is power there is always resistance to power. GPT-3 can and will be subverted and used to achieve particular political ends of our own choosing.
Because of my own dissertation research I’m partial to Foucault’s idea of governmentality, especially as it relates to ideas of legibility (Scott, 1998)–the who, what and why of legibility projects, aka archives. GPT-3 presents some interesting challenges in terms of legibility because the model is so complex, the results it generates defy deductive logic and auditing. In some ways GPT-3 obscures more than it makes a population legible, as Foucault moved from disciplinary analysis of the subject, to the ways in which populations are described and governed through the practices of pastoral power, of open datasets. Again the significance of CommonCrawl as an archival project, as a web legibility project, jumps to the fore. I’m not as up on Arendt as I should be, so one outcome of their presentation is that I’m going to read her The Human Condition which they had in a slide. I’m long overdue.
Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press.