Digital Curation

I’m teaching an undergraduate course in digital curation this fall at the University of Maryland in the College for Information Studies. It’s the first time I’ve taught digital curation. But I have the good fortune to be following in the footsteps of previous instructors like Katrina Fenlon, Adam Kriesberg and Ricky Punzalan.

I’ll admit it has been a challenge for me to design this class for an undergraduate audience. From teaching computer programming to undergradates in the iSchool I know that, unlike masters level students, they are pursing a degree in information studies because they are interested in careers that involve data processing, data visualization, security and design in a variety of contexts, and aren’t specifically interested in the cultural heritage sector. Data curation often gets expressed as something that libraries, museums and archives are exclusively interested in. But I think most everyone recognizes that the description and preservation of data is a major concern for all sorts of communities of practice especially with the increasing use of machine learning technologies.

I’ve linked my syllabus here, which is a little bit of a work in progress as I move chunks of it into Canvas. One thing I wanted to do, especially as we are all remote this semester, is to provide enough space for them to explore this content without feeling like they are being rushed. There are going to be 7 two week modules, where the first week is time for them to read/discuss with me and each other, and the second week is time to try out some of the concepts in some coding exercises in Jupyter notebooks.

The students will already have been exposed to Python programming and the fundamentals of information organization. My overarching idea is to get them thinking early on about the social and political aspects of data, and build on fundamental concepts like files, file formats, metadata and up to concepts like platforms, community and infrastructure. Hopefully somewhere along the way they will get bitten by the digital curation bug.

I spent a bit of time thinking about whether to have students use their own computers to do these exercises, and also whether I should ask them to work at the command line. I found some good resources for doing digital curation work at the command line. But the trouble is, not everyone has the same command line, and I could spend a long amount of time getting people set up, and helping them throughout the semester (I know this from experience teaching the programming class). I also know that some students have more resources to get high end computers, and many are in different states of repair.

After taking a look at the compute resources available at the university I decided to give Colab a try this semester since it offers a significant resources on demand (12GB RAM, 100GB disk), all available in a pristine VM which you pip install into. If you are interested in the Jupyter notebooks I’ll be using stay tuned to the GitHub repository where I will be dropping them over the course of the semester.