As you can see, I’ve recently changed things around here at Yeah, it’s looking quite spartan at the moment, although I’m hoping that will change in the coming year. I really wanted to optimize this space for writing in my favorite editor, and making it easy to publish and preserve the content. Wordpress has served me well over the last 10 years and up till now I’ve resisted the urge to switch over to a static site. But yesterday I converted the 394 posts, archived the Wordpress site and database, and am now using Jekyll. I haven’t been using Ruby as much in the past few years, but the tooling around Jekyll feels very solid, especially given GitHub’s investment in it.

Honestly, there was something that pushed me over the edge to do the switch. Next week I’m starting in the University of Maryland iSchool, where I will be pursuing a doctoral degree. I’m specifically hoping to examine some of the ideas I dredged up while preparing for my talk at NDF in New Zealand a couple years ago. I was given almost a year to think about what I wanted to talk about – so it was a great opportunity for me to reflect on my professional career so far, and examine where I wanted to go.

After I got back I happened across a paper by Steven Jackson called Rethinking Repair, which introduced me to what felt like a very new and exciting approach to information technology design and innovation that he calls Broken World Thinking. In hindsight I can see that both of these things conspired to make returning to school at 46 years of age look like a logical thing to do. If all goes as planned I’m going to be doing this part-time while also working at the Maryland Institute for Technology in the Humanities, so it’s going to take a while. But I’m in a good spot, and am not in any rush … so it’s all good as far as I’m concerned.

I’m planning to use this space for notes about what I’m reading, papers, reflections etc. I thought about putting my citations, notes into Evernote, Zotero, Mendeley etc, and I may still do that. But I’m going to try to keep it relatively simple and use this space as best I can to start. My blog has always had a navel gazey kind of feel to it, so I doubt it’s going to matter much. I will be tagging all this content with study.

To get things started I thought I’d share the personal statement I wrote for admission to the iSchool. I’m already feeling more focus than when I wrote it almost a year ago, so it will be interesting to return to it periodically. The thing that has become clearer to me in the intervening year is that I’m increasingly interested in examining the role that broken world thinking has played in both the design and evolution of the Web.

So here’s the personal statement. Hoepfully it’s not too personal :-)

For close to twenty years I have been working as a software developer in the field of libraries and archives. As I was completing my Masters degree in the mid-1990s, the Web was going through a period of rapid growth and evolution. The computer labs at Rutgers University provided me with what felt like a front row seat to the development of this new medium of the World Wide Web. My classes on hypermedia and information seeking behavior gave me a critical foundation for engaging with the emerging Web. When I graduated I was well positioned to build a career around the development of software applications for making library and archival material available on the Web. Now, after working in the field, I would like to pursue a PhD in the UMD iSchool to better understand the role that the Web plays as an information platform in our society, with a particular focus on how archival theory and practice can inform it. I am specifically interested in archives of born digital Web content, but also in what it means to create a website that gets called an archive. As the use of the Web continues to accelerate and proliferate it is more and more important to have a better understanding of its archival properties.

My interest in how computing (specifically the World Wide Web) can be informed by archival theory developed while working in the Repository Development Center under Babak Hamidzadeh at the Library of Congress. During my eight years at LC I designed and built both internally focused digital curation tools as well as access systems intended for researchers and the public. For example, I designed a Web based quality assurance tool that was used by curators to approve millions of images that were delivered as part of our various digital conversion projects. I also designed the National Digital Newspaper Program’s delivery application, Chronicling America, that provides thousands of researchers access to over 8 million pages of historic American newspapers every day. In addition, I implemented the data management application that transfers and inventories 500 million tweets a day to the Library of Congress. I prototyped the Library of Congress Linked Data Service which makes millions of authority records available using Linked Data technologies.

These projects gave me hands on, practical experience using the Web to manage and deliver Library of Congress data assets. Since I like to use agile methodologies to develop software, this work necessarily brought me into direct contact with the people who needed the tools built, namely archivists. It was through these interactions over the years that I began to recognize that my Masters work at Rutgers University was in fact quite biased towards libraries, and lacked depth when it came to the theory and praxis of archives. I remedied this by spending about two years of personal study focused on reading about archival theory and practice with a focus on appraisal, provenance, ethics, preservation and access. I also began participating as a member of the Society of American Archivists.

During this period of study I became particularly interested in the More Product Less Process (MPLP) approach to archival work. I found that MPLP had a positive impact on the design of archival processing software since it oriented the work around making content available, rather than on often time consuming preservation activities. The importance of access to digital material is particularly evident since copies are easy to make, but rendering can often prove challenging. In this regard I observed that requirements for digital preservation metadata and file formats can paradoxically hamper preservation efforts. I found that making content available sooner rather than later can serve as an excellent test of whether digital preservation processing has been sufficient. While working with Trevor Owens on the processing of the Carl Sagan collection we developed an experimental system for processing born digital content using lightweight preservation standards such as BagIt in combination with automated topic model driven description tools that could be used by archivists. This work also leveraged the Web and the browser for access by automatically converting formats such as WordPerfect to HTML, so they could be viewable and indexable, while keeping the original file for preservation.

Another strand of archival theory that captured my interest was the work of Terry Cook, Verne Harris, Frank Upward and Sue McKemmish on post-custodial thinking and the archival enterprise. It was specifically my work with the Web archiving team at the Library of Congress that highlighted how important it is for record management practices to be pushed outwards onto the Web. I gained experience in seeing what makes a particular web page or website easier to harvest, and how impractical it is to collect the entire Web. I gained an appreciation for how innovation in the area of Web archiving was driven by real problems such as dynamic content and social media. For example I worked with the Internet Archive to archive Web content related to the killing of Michael Brown in Ferguson, Missouri by creating an archive of 13 million tweets, which I used as an appraisal tool, to help the Internet Archive identify Web content that needed archiving. In general I also saw how traditional, monolithic approaches to system building needed to be replaced with distributed processing architectures and the application of cloud computing technologies to easily and efficiently build up and tear down such systems on demand.

Around this time I also began to see parallels between the work of Matthew Kirschenbaum on the forensic and formal materiality of disk based media and my interests in the Web as a medium. Archivists usually think of the Web content as volatile and unstable, where turning off a web server can result in links breaking, and content disappearing forever. However it is also the case that Web content is easily copied, and the Internet itself was designed to route around damage. I began to notice how technologies such as distributed revision control systems, Web caches, and peer-to-peer distribution technologies like BitTorrent can make Web content extremely resilient. It was this emerging interest in the materiality of the Web that drew me to a position in the Maryland Institute for Technology in the Humanities where Kirschenbaum is the Assistant Director.

There are several iSchool faculty that I would potentially like to work with in developing my research. I am interested in the ethical dimensions to Web archiving and how technical architectures embody social values, which is one of Katie Shilton’s areas of research. Brian Butler’s work studying online community development and open data is also highly relevant to the study of collaborative and cooperative models for Web archiving. Ricky Punzalan’s work on virtual reunification in Web archives is also of interest because of its parallels with post-custodial archival theory, and the role of access in preservation. And Richard Marciano’s work on digital curation, in particular his recent work with the NSF on Brown Dog, would be an opportunity for me to further my experience building tools for digital preservation.

If admitted to the program I would focus my research on how Web archives are constructed and made accessible. This would include a historical analysis of the development of Web archiving technologies and organizations. I plan to look specifically at the evolution and deployment of Web standards and their relationship to notions of impermanence, and change over time. I will systematically examine current technical architectures for harvesting and providing access to Web archives. Based on user behavior studies I would also like to reimagine what some of the tools for building and providing access to Web archives might look like. I expect that I would spend a portion of my time prototyping and using my skills as a software developer to build, test and evaluate these ideas. Of course, I would expect to adapt much of this plan based on the things I learn during my course of study in the iSchool, and the opportunities presented by working with faculty.

Upon completion of the PhD program I plan to continue working on digital humanities and preservation projects at MITH. I think the PhD program could also qualify me to help build the iSchool’s new Digital Curation Lab at UMD, or similar centers at other institutions. My hope is that my academic work will not only theoretically ground my work at MITH, but will also be a source of fruitful collaboration with the iSchool, the Library and larger community at the University of Maryland. I look forward to helping educate a new generation of archivists in the theory and practice of Web archiving.