Why?

I tuned into the virtual talk at Berkeley last Friday evening (for me) from Clifford Lynch Why “Web Archiving” is No Longer a Useful Concept or Phrase. Alas, the event was not recorded. As one participant pointed out, this was perhaps a great illustration of Lynch’s main point, that the concept of “web archiving” means different things to different people. Should the video from a desktop application (Zoom) that is pulling data from a web server using a URL be considered part of the web? I think Lynch was suggesting that it wasn’t, or that it was at best unclear. I think he is wrong.

I didn’t take good notes, and almost felt encouraged not to share anything publicly about the talk. But you can’t really put a talk title like this one out in the public eye and expect that. Lynch was mostly previewing a paper that he is writing for Against the Grain, and getting some feedback from attendees. So I think you can expect to see his argument in print, perhaps in some modulated form, at some point in the future.

His basic argument seemed to be that “Web Archiving” is no longer useful because:

the web is unfathomably large
web documents are no longer static, and are assembled dynamically based on user behavior
web documents are not canonical and are customized for particular users based on their browsing history, and countless other inscrutable algorithmic machinations
the web is run by corporations who actively make their content difficult to archive (so people have to come to them for it)
some web content is designed to disappear (e.g Snapchat, Insta Stories, etc)
all sorts of bespoke apps that traffic in JSON have supplanted browsers as clients of the web

I’m probably missing, and misstating, parts of his argument. But for me there was an implicit all that was missing from his title, and his argument, which (for clarity) should have read Why “Archiving All the Web” is No Longer a Useful Concept or Phrase. Of course this would hardly have been eye catching anymore, because most archivists, especially ones who are directly involved in archiving the web, already know this…all too well.

The problem with the original title is that it seems to suggest that collecting and storing content from the web is no longer a useful concept or practice. But if there is anything that’s no longer useful here, it is our antiquated idea that web archives have a singular, distinct architecture: harvesting data from the web (e.g. Heritrix), storing the data (WARC), and playing it back for users (e.g. Wayback Machine).

Clearly this is a type of web archive, but limiting our discussion of web archives to only this particular form ignores the fact that there is abundant practice by a diversity of actors who are actively collecting data from the web and using it for a wide range of purposes, sometimes at great cost. These are web archives too, and talking about them as web archives is important because it helps us put these practices in historical context.

One participant commented in the Zoom chat that perhaps a better phrase than “Web Archiving” was “Archiving from the Web”. I tend to agree since it better illustrates a particular mode of collecting and storing data that is prevalent in web archiving activities. But to Lynch’s point, I think he is onto something if instead he said that the practice of Web Archiving needs more specificity and nuance. Hopefully that’s the direction that his article is spun in.

The argument that the web is no longer static, which therefore means that web archiving is no longer possible is simply just not the case. The web hasn’t been (only) static for a very long time, and that hasn’t prevented web archiving technology from evolving. The architecture of the web and the HTTP protocol still mean that web clients receive bitstreams of data with an envelope of metadata (aka representations), and these can be stored and retrieved. Furthermore the implication that static publishing to the web is no longer practiced is just not true. Much to the contrary, static web publishing has been going through a renaissance over the past decade.

It seems to me that this talk was primarily an extension of Lynch’s previous writing about the difficulties that algorithms present to archives (Lynch, 2017). I thought that presentation was pretty insightful (rather than inciteful) because of how it highlighted the important roles that social scientists and humanists play in understanding and documenting the algorithms that shape and govern what and how we see on the web. The slippery, elusive, and perennially important context. Archives have always been only a sliver of a window into process (Harris, 2002), and it remains the case when archiving from the web.

Ok, ok. Last year I finished writing a dissertation about web archives, so I’ll admit I’m kind of touchy on this subject :-) What do you mean this thing I spent 6 years researching is no longer a useful concept?!? As a way to vent some of my angst I created a little web app. I hope you enjoy it. It’s a static website.

References

Harris, V. (2002). The archival sliver: power, memory, and archives in South Africa. Archival Science, 2(1-2), 63–86.

Lynch, C. (2017). Stewardship in the age of algorithms. First Monday, 22(12). Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/8097