Appraisal in Web Archives

Two days ago I defended my dissertation proposal in the College for Information Studies at umd.edu where I have been in the PhD program for the last four years. As I enter my fifth year it feels like it has been a long haul, and I’ve still got a significant amount of work ahead of me. But I feel good about how my ideas have been refined since I wrote my personal statement. This is almost entirely the work of the people I have been working with at the University of Maryland both in the iSchool and at MITH. So thanks to them for their time, especially my committee members: Ricky Punzalan, Kari Kraus, Wayne Lutters, Katrina Fenlon and Matt Kirschenbaum.

I’ve been trying to write here about my research process, and have largely failed at that…although you will find some traces of my thinking here in this blog. I thought I would share my proposal abstract and then also point out that you can follow along with my writing process at GitHub if you really want to see more.

I do think that I need to find a catchier title for the project. Hopefully that will come in the writing–but if you have any ideas about that or anything please get in touch.

Appraisal in Web Archives

The web is a site of constant breakdown in the form of broken links, failed business models, unsustainable infrastructure, obsolescence and general neglect. Some estimate that about a quarter of all links break every 7 years, and even within highly curated regions of the web, such as scholarly publishing, rates of link rot can be as high as 50%. Over the past twenty years web archiving projects at cultural heritage organizations have worked to stem this tide of loss. However, we still understand quite little about the diversity of actors involved in web archiving, and how content is selected for web archives. This is due in large part to how web archiving projects operate out of sight as complex sociotechnical assemblages at the boundaries between human and automated processes.

This dissertation explores appraisal practices in web archives from the perspective of Science and Technology Studies in order to answer two motivating research questions. 1) How is appraisal currently being enacted in web archives? 2) How do definitions of what constitutes a web archive relate to the practice of appraisal? Answering these questions will help information studies researchers and practitioners better understand the dynamics that shape our memory of the past that is mediated by the web, and will also inform archival studies pedagogy. Critical Algorithm and Data Studies provide a theoretical framework for examining how web archiving systems function as both computational and cultural objects that participate in a wide variety of social and political projects. As machine learners increasingly become readers of web archives the stakes for understanding the dynamics by which these collections are built could not be higher.

Interviews with web archives practitioners, and a year long field study at a government agency involved in archiving data from the web, will provide rich descriptive material for an ethnographic analysis of what current practices are, and how these practices are shaped by the ontological dimensions of web archives. Critical Discourse Analysis will be used to analyze interview transcripts and field notes from participant observation in meetings and in the work of the archive. In addition, methods drawn from Trace Ethnography will be used to analyze version control histories, and ticketing systems that have been used to coordinate and assemble a web archive that challenges our notions of what constitutes a web archive.