On Snowden and Archival Ethics

Much like you I’ve been watching the evolving NSA Surveillance story following the whistle-blowing by former government contractor Edward Snowden. Watching isn’t really the right word…I’ve been glued to it. I don’t have a particularly unique opinion or observation to make about the leak, or the ensuing dialogue – but I suppose calling it “whistle blowing” best summarizes where I stand. I just wanted to share a thought I had on the train to work, after reading Ethan Zuckerman’s excellent Me and My Metadata - Thoughts on Online Surveillance. I tried to fit it in into 140 characters, but it didn’t quite work.

Zuckerman’s post is basically about the value of metadata in research. He opened up his Gmail archive to his students, and they created Immersion, which lets you visualize his network of correspondence using only email metadata (From, To, Cc and Date). Zuckerman goes on to demonstrate what this visualization says about him. The first comment in the post by Jonathan O’Donnell has a nice list of related research on the importance of metadata to discovery. Zuckerman’s work immediately reminded me of Sudheendra Hangal’s work on MUSE at Stanford, which he and his team have written about extensively. MUSE is a tool that enables scholarly research using email archives. It was then that I realized why I’ve been so fascinated with the Snowden/NSA story.

Over the past few years there has been increasing awareness in the archival community about the role of forensics tools in digital preservation, curation and research. Matt Kirschenbaum’s Mechanisms had a big role in documenting, and spreading the word about how forensics tools can be (and are) used in the digital humanities. The CLIR report Digital Forensics and Born-Digital Content in Cultural Heritage Collections (co-authored by Kirschenbaum) brought the topic directly to cultural heritage organizations, as did the AIMS report. If you’re not convinced, a search in Google Scholar shows just how prevalent and timely the topic is. The introduction to the CLIR report has a nice summary of why forensics tools are of interest to archives that are dealing with born digital content:

The same forensics software that indexes a criminal suspect’s hard drive allows the archivist to prepare a comprehensive manifest of the electronic files a donor has turned over for accession; the same hardware that allows the forensics investigator to create an algorithmically authenticated “image” of a file system allows the archivist to ensure the integrity of digital content once captured from its source media; the same data-recovery procedures that allow the specialist to discover, recover, and present as trial evidence an “erased” file may allow a scholar to reconstruct a lost or inadvertently deleted version of an electronic manuscript—and do so with enough confidence to stake reputation and career.

Digital forensics therefore offers archivists, as well as an archive’s patrons, new tools, new methodologies, and new capabilities. Yet as even this brief description must suggest, <digital forensics does not affect archivists’ practices solely at the level of procedures and tools. Its methods and outcomes raise important legal, ethical, and hermeneutical questions about the nature of the cultural record, the boundaries between public and private knowledge, and the roles and responsibilities of donor, archivist, and the public in a new technological era.

When collections are donated to an archive, there is usually a gift agreement between the donor and the archival organization, which documents how the collection of material can be used. For example, it is fairly common for there to be a period where portions (or all) of the archive are kept dark. Much less often gift agreements can stipulate that the collection must be made open on the Web, and sometimes money can change hands. Born digital content in archives is new enough that cultural heritage organizations are still grappling with the best way to talk to their donors about donating born digital content.

There has been a bit of attention to sharing best practices about born digital content between organizations, and rising awareness about the sorts of issues that need to be considered. As a software developer tasked with building applications that can be used across these archival collections, the special-snowflake nature to these gift agreements has been a bit of annoyance. If every collection of born digital content has slightly different stipulations about what, when and how content can be used it makes building access applications difficult. The situation is compounded somewhat because the gift agreements themselves aren’t shared publicly (at least at my place of work), so you don’t even know what you can and can’t do. I’ve observed that this has a tendency to derail conversations about access to born digital content–and access is an essential ingredient to insuring the long term preservation of digital content. It’s not like you can take a digital file and put it on a server and come back in 25 or even 5 years and expect to open it, and use it.

So, what does this have to do with Zuckerman’s post, and the intrinsic value of metadata to the NSA? When Zuckerman provided his students with access to his email archive he did it in the context of a particular trust scenario. A gift agreement in an archive serves the same purpose, by documenting a trust scenario between the donor and the institution that is receiving the gift. The NSA allegedly has been collecting information from Verizon, Facebook, Google, et al outside of the trust scenario provided by the Fourth Amendment to the Constitution. After looking at things this way, the special-snowflakism of gift agreements doesn’t seem so annoying any more. It is through these agreements that cultural heritage organizations establish their authenticity and trust. And it is by them that they become a desirable place to deposit born digital content. If they have to be unique per-donor, and this hampers unified access to born digital collections, this seems like a price worth paying. Ideally there would be a standard set of considerations to use when putting the gift agreement together. But if we can’t fit everyone into the same framework, maybe that’s not such a bad thing.

The other common place thing that strikes me is that the same technology that can be used for good, say digital humanities research, or forensics discovery, can also be used for ill. Having a strong sense of the ethics, as a professional, as a citizen, and as a human being is extremely important to establishing the context in which technology is used – and negotiating between the three can sometimes require finesse, and in the case of Snowden, courage.