A few days ago I asked folks on Twitter if they had any born-digital archival collections that were Internet accessible (web, ftp, etc). I’m testing a little prototype application called fondz (which I will hopefully write more about later if it proves to be headed in a useful direction) and wanted a collection I could actually talk about in a blog post (as opposed to the content I’m testing with and cannot). I specifically wanted born-digital archival content with well defined provenance, because fondz assumes the content isn’t just a random assortment of things, but forms a thematic unit of some kind: e.g. content that is donated to an archive as part of a personal or organizational collection of some kind.

Call me crazy, but if you squint right, the Web looks like its full of born-digital archives–they are called websites! But fondz is oriented around files that have been collected offline: collections of word processing or image files, that may have accumulated on your hard drive and perhaps gotten backed up to a disk of some kind, and then ultimately gifted to an archive. So, like a lot of archival content, they also have access rights associated with them.

I received several helpful responses, and thought I would jot them down here in case you go looking for born-digital collections too. If you have one you would like me to add just tweet them at me and I’ll add them.

  • Alberto Accomazzi suggested arXiv which contains lots of scientific material in pdf and LaTeX. Depending on the scope you could get content from a particular author, organization or discipline. If you squint right I guess any digital repository that has a strong sense of an author identity and/or the subject of the content could work as a source of born-digital archival content. There are lots of so called “institutional repositories” on the Web. But pre-print repositories are particularly interesting because they often represent work in progress, not the finished, polished thing that people often think of as “published”. Pre-prints are more like the documents you have lying around on your computer, that you happen to have pushed out to let people know what you are working on, and to share research that is still underway.
  • Mark Matienzo pointed me at the Richard Rorty born digital files at the University of California at Irvine. In order to download the files you need to apply for an account in their UCISpace application. I filled out the form, and was pleasantly surprised when I received an email the next day granting me access–way to go UCI!. Are there many other examples of this sort of Web enabled interaction with researchers? Or have I been asleep for a while and this is the new normal for archives on the Web? I particularly enjoyed Mark’s suggestion because I’m a big fan of Rorty’s work, so it will be fun to look at the content. Subsequently Matthew followed up on Twitter to let me know that UCISpace has other born digital collections available. Aaron Brenner also pointed me at some slides about UCI’s virtual reading room.
  • Erin O’meara pointed me to Jill Sexton and Meg Tuomala at the University of North Carolina at Chapel Hill who have some born-digital collections. Erin then pointed me to the Carolina Digital Repository which seems to have a fair bit of born digital material: for example this Word document from a folder named Dad’s laptop in the John Chapman collection. This content is available without having to login, which is nice. I haven’t poked around to see how much more is available yet.
  • Mark Jordan referred me to Nick Ruest of York University. Nick has been kind enough to bag up some content from the Allan Fleming collection and put it on the Web for me to download. I noticed that the system at York provided a way to login, so maybe someday they could offer a similar service to UC Irvine.
  • Trevor Owens suggested I get in touch with the Maryland Institute for Technology in the Humanities to see if they might have some of their born digital content online. MITH is local for me, so I could conceivably head over their with a thumb drive.

If you have other ideas I’d still be interested to hear about them, and will add them here if you comment here, or tweet them at me. I’m especially interested in collections that fit the UC Irvine model of making born digital collections available on the Web via a researcher request step, or where they are simply publicly available. It’s great to see traditional archives moving onto the Web this way to make born digital collections available.

As an aside it’s interesting to me how the category of content we call born-digital is beginning to be coterminous with Web content at a particular point in time – especially as more and more of our born digital content lives on the Web in cloud services of some ilk.