Over the past few years I’ve been trying to deepen my understanding of the literature of and about archives. My own MLIS education was heavy on libraries and light on archives; so I was really quite unaware of how rich the thinking about archives is…and how much more relevant it is for the work of digital preservation.

After not being a member of any professional organization for over ten years I joined the Society of American Archivists two years ago. I really enjoyed when the SAA’s quarterly American Archivist started showing up in my mailbox. Incidentally they have put all their content online for the public, but keep the last 3 years embargoed for SAA members only.

Since I have so much catching up to do I thought it would be interesting to try to harvest some of the article metadata that could be gleaned from the website, to see if I could get my computer to teach me something about the 76 years of content. If you are interested you can find some code I wrote to do this, and the resulting metadata about the 42,432 articles on Github.

As a quick test I thought it would be interesting to throw the first names of authors through genderator to see if the gender of authors has changed over time. My first pass just displays the number of authors per year by their gender.

Since the number of authors per article isn’t constant, and the number of articles per year is also variable the graph is a bit noisy. But if you calculate the percentage of authors per year that were male, female or unknown you get a much smoother graph.

As you can see genderator isn’t perfect: sometimes it can’t even guess the author’s gender 20% of the time. But even with that noise it’s clear to see a gradual increase in the number of women authors, which begins in 1970s and is continuing even to today, where women seem to be represented more than men … although it’s a bit too choppy to tell really.

If you are interested in using this data let me know. I have the publicly available PDF content in an s3 bucket if you have research you’d like to do on it.