Seminar Week 3

In this week’s class we discussed three readings: (Bates, 1999), (Dillon & Norris, 2005) and (Manzari, 2013). This was our last general set of readings about information science before diving into some of the specialized areas. I wrote about my reaction to Bates over in Red Thread.

Bates, M. (1999). The invisible substrate of information science. Journal of the Society for Information Science, 50(12), 1043–1050.

Dillon, A., & Norris, A. (2005). Crying wolf: An examination and reconsideration of the perception of crisis in lis educatino. Journal of Education in Library and Information Science.

Manzari, L. (2013). Library and information science journal prestige as assessed by library and information science faculty. Library Quarterly, 83, 42–60.


Zombie Information Science

One of the readings for INST800 this week was Bates (2007). It’s a short piece that outlines how she and Mary Maack organized their work on the third edition of the Encyclopedia of Library and Information Sciences. When creating an encyclopedia it becomes immediately necessary to define the scope, so you have some hope of finishing the work. As she points out, information science is contested territory now because of all the money and power that is aggregated in Silicon Valley. Everyone wants a piece of it now, whereas it has struggled to be a discipline before people started billion dollar companies in their garages:

Bates, M. J. (2007). Defining the information disciplines in encyclopedia development. Information Research, 12. Retrieved from http://www.informationr.net/ir/12-4/colis/colis29.html


Red Thread

Bates, M. (1999). The invisible substrate of information science. Journal of the Society for Information Science, 50(12):1043–1050.


Seminar Week 2

Here are my working notes for the readings this week in INST888. At some point I will try to integrate the annotations I am making using Hypothes.is, perhaps via some JavaScript that pulls them into this page.


Seminar Week 1

These are some notes for the readings from my first Seminar class. It’s really just a test to see if my BibTeX/Jekyll/Pandoc integration is working. More about that in a future post hopefully…


iSchool

As you can see, I’ve recently changed things around here at inkdroid.org. Yeah, it’s looking quite spartan at the moment, although I’m hoping that will change in the coming year. I really wanted to optimize this space for writing in my favorite editor, and making it easy to publish and preserve the content. Wordpress has served me well over the last 10 years and up till now I’ve resisted the urge to switch over to a static site. But yesterday I converted the 394 posts, archived the Wordpress site and database, and am now using Jekyll. I haven’t been using Ruby as much in the past few years, but the tooling around Jekyll feels very solid, especially given GitHub’s investment in it.


Links in Obergefell v. Hodges

Last week’s landmark ruling from the Supreme Court on same sex marriage was routinely published on the Web as a PDF. Given the past history of URL use in Supreme Court opinions I thought I would take a quick look to see what URLs were present. There are two, both are in Justice Alito’s dissenting opinion, and one is broken … just four days after the PDF was published. You can see it yourself at the bottom of page 100 in the PDF.

If you point your browser at

http://www.cdc.gov/nchs/data/databrief/db18.pdf

you will get a page not found error:

Sadly even the Internet Archive doesn’t have a snapshot of the page available.

But notice it thinks it can get a copy of it still. That’s because the Center for Disease Control’s website is responding with a 200 OK instead of a 404 Not Found:

zen:~ ed$ curl -I http://www.cdc.gov/nchs/data/databrief/db18.pdf
HTTP/1.1 200 OK
Content-Type: text/html
X-Powered-By: ASP.NET
X-UA-Compatible: IE=edge,chrome=1
Date: Tue, 30 Jun 2015 16:22:18 GMT
Connection: keep-alive

At any rate, it’s not Internet Archive’s fault that they haven’t archived the Webpage originally published in 2009, because the URL is actually a typo. Instead it should be

http://www.cdc.gov/nchs/data/databriefs/db18.pdf

which leads to:

So between the broken URL and the 200 OK for something not found we’ve got issues of link rot and reference rot all rolled up into a one character typo. Sigh.

I think a couple lessons for web publishers can be distilled from this little story:

  • when publishing on the Web include link checking as part of your editorial process
  • if you are going to publish links on the Web use a format that’s easy to check … like HTML.


SKOS and Wikidata

For #DayOfDH yesterday I created a quick video about some data normalization work I have been doing using Wikidata entities. I may write more about this work later, but the short version is that I have a bunch of spreadsheets with names in them (authors) in a variety of formats and transliterations, which I need to collapse into a unique identifier so that I can provide a unified display of the data per unique author. So for example, my spreadsheets have information for Fyodor Dostoyevsky using the following variants:

  • Dostoeieffsky, Feodor
  • Dostoevski
  • Dostoevski, F. M.
  • Dostoevski, Fedor
  • Dostoevski, Feodor Mikailovitch
  • Dostoevskii
  • Dostoevsky
  • Dostoevsky, Fiodor Mihailovich
  • Dostoevsky, Fyodor
  • Dostoevsky, Fyodor Michailovitch
  • Dostoieffsky
  • Dostoieffsky, Feodor
  • Dostoievski
  • Dostoievski, Feodor Mikhailovitch
  • Dostoievski, Feodore M.
  • Dostoievski, Thedor Mikhailovitch
  • Dostoievsky
  • Dostoievsky, Feodor Mikhailovitch
  • Dostoievsky, Fyodor
  • Dostojevski, Feodor
  • Dostoyeffsky
  • Dostoyefsky
  • Dostoyefsky, Theodor Mikhailovitch
  • Dostoyevski, Feodor
  • Dostoyevsky
  • Dostoyevsky, Fyodor
  • Dostoyevsky, F. M.
  • Dostoyevsky, Feodor Michailovitch
  • Dostoyevsky, Feodor Mikhailovich

So, obviously, I wanted to normalize these. But I also want to link the name up to an identifier that could be useful for obtaining other information, such as an image of the author, a description of their work, possibly link to works by the author, etc. I’m going to try to map the authors to Wikidata, largely because there are links from Wikidata to other places like the Virtual International Authority File, and Freebase, but there are also images on Wikimedia Commons, and nice descriptive text for the people. As an example here is the Wikidata page for Dostoyevsky.

To aid in this process I created a very simple command line tool and library called wikidata_suggest which uses Wikidata’s suggest API to interactively match up a string of text to a Wikidata entity. If Wikidata doesn’t have any suggestions as a fallback the utility looks in a page of Google’s search results for a Wikipedia page and then will optionally let you use that text.

SKOS

Soon after tweeting about the utility and the video I made about it I heard from Alberto who works on the NASA Astrophysics Data System and was interested in using wikidata_suggest to try to link up the Unified Astronomy Thesaurus to Wikidata.

Fortunately the UAT is made available as a SKOS RDF file. So I wrote a little proof of concept script named skos_wikidata.py that loads a SKOS file, walks through each skos:Concept and asks you to match the skos:prefLabel to Wikidata using wikidata_suggest. Here’s a quick video I made of what this process looks like:

I guess this is similar to what you might do in OpenRefine, but I wanted a bit more control over how the data was read in, modified and matched up. I’d be interested in your ideas on how to improve it if you have any.

It’s kind of funny how Day of Digital Humanities quickly morphed into Day of Astrophysics…


A Personal Panopticon

Here is how you can use your Google Search History and jq to create a top 10 list of the things you’ve googled for the most.

First download your data from your Google Search History. Yeah, creepy. Then install jq. Wait for the email from Google that your archive is ready and download then unzip it. Open a terminal window in the Searches directory, and run this:

jq --raw-output '.event[].query.query_text' *.json \
  | sort | uniq -c | sort -rn | head -10

Here’s what I see for the 75,687 queries I’ve typed into google since July 2005.

309 google analytics
 130 hacker news
 116 this is my jam
  83 site:chroniclingamerica.loc.gov
  68 jquery
  54 bagit
  48 twitter api
  44 google translate
  37 wikistream
  37 opds

These are (mostly) things that I hadn’t bothered to bookmark, but visited regularly. I suspect there is something more compelling and interesting that could be done with the data. A personal panopticon perhaps.

Oh, and I’d delete the archive from your Google Drive after you’ve downloaded it. If you ever grant other apps the ability to read from your drive they could read your search history. Actually maybe this whole exercise is fraught with peril. You should just ignore it.


VirtualEnv Builds in Sublime Text 3

Back in 1999 I was a relatively happy Emacs user, and was beginning work at a startup where I was one of the first employees after the founders. Like many startups, in addition to owning the company, the founders were hackers, and were routinely working on the servers. When I asked if Emacs could be installed on one of the machines I was told to learn Vi … which I proceeded to do. I needed the job.

Here I am 15 years later, and am finally starting to use Sublime Text 3 a bit more in my work. I’m not be a cool kid anymore, but I can still pretend to be one, eh? The Vintageous plugin lets my fingers feel like they are in Vim, while being able to take advantage of other packages for editing Markdown, interacting with Git and the lovely eye-pleasing themes that are available. I still feel a bit dirty because unlike Vim, Sublime is not opensource ; but at the same time it does feel good to support a small software publisher who is doing good work. Maybe I’ll end up switching back to Vim and supporting it.

Anyway, as a Python developer one thing I immediately wanted to be able to do was to use my project’s VirtualEnv during development, and to run the test suite from inside Sublime. The Virtualenv package makes creating, activating, deactivating, deleting a virtualenv a snap. But I couldn’t seem to get the build to work properly with the virtualenv, even after setting the Build System to Python - Virtualenv

Sublime Text 3 - VirtualEnv

After what felt like a lot of googling around (it was probably just 20 minutes) I didn’t seem to find an answer until I discovered in the Project documentation that I could save my Project, and then go to Project -> Edit Project and add a build_systems stanza like this:

{
 "folders":
 [
  {
   "path": "."
  }
 ],
 "virtualenv": "/Users/ed/.virtualenvs/curio",
 "build_systems": [
  {
   "name": "Test",
   "shell_cmd": "/Users/ed/.virtualenvs/curio/bin/python setup.py test"
  }
 ] 
}

Notice how the shell_cmd is using the Python executable in my VirtualEnv? After saving that I was able to go into Tools -> Build System and set the build system to Test, which matches the name of the build system you added in the JSON. Now a command-B will run my test suite with the VirtualEnv.

Sublime Text 3 - VirtualEnv w/ Build

I guess it would be nice if the VirtualEnv plugin for Sublime did something to make this easier. But rather than go down that rabbit hole I decided to write it down here for the benefit of my future self (and perhaps you).

If you know of a better way to do this please let me know.