I recently reviewed two books about web archiving for American Archivist and thought it was worth mentioning it here so I could 1) link to a nice free pdf version, and 2) situate it a bit more since I’m off social media (e.g. Twitter) at the moment.

The review is of Ian Milligan’s History in the Age of Abundance? and The Web as History edited by Niels Brügger and Ralph Schroeder. I was asked to review both of them together, and no arm twisting was involved because I wanted to read both of these books for my dissertation research. I should say that I’ve met the authors before, and was already a fan of their work, particularly because of how they have been instrumental in developing web archives as a practice. I kind of wish they had me review Brügger’s The Archived Web, but that’s for another time.

The review turned into a bit of a complicated little dance because I wanted readers to get the sense of web archives not only as useful for historians wanting to do their work in understanding the past, but also as technical constructions that create our present. I invoked Hugh Taylor’s still highly relevant criticism of archivy as not concerned enough with how archives shape our lived experience, online and offline. Consider for example the current debate around facial recognition technologies that are built upon archives of images collected from the web.

But most of all the thing that emerged while writing this brief review was a recognition of how strange it is that the records of web archives (the WARC file) are almost entirely unavailable for study by the public. You need to know a secret handshake to get access to them, and sometimes they are simply not available. This is even the case for the most public of public web archives, the Internet Archive. The contents of WARC files are made available piecemal via interfaces such as the Wayback Machine, which render what a particular URL looked like at a given time. But most of the analyses that Milligan, Brügger and Schroeder discuss are related to understanding web archives as data, or Collections as Data if you will.

Web archives, like our non-web archives, are technical constructions for achieving particular goals. The absence of WARC data from the service offerings of web archives is a curious phenomena. Imagine going to visit an archive, and reading a finding aid, locating a box in the inventory, but not being able to request the folders within it? Perhaps that’s not the right analogy, and that perhaps its like requesting an entire record series, or group? That would not be workable in most archives. But when we situate distant reading practices in web archives we must consider who it is who can do that reading, and why.

I’m not sure I articulated this very well in the review, but I’d be interested to hear what you think, either here or at ehs@pobox.com