rda/frbr and the semantic web

I was interested to learn from Alistair Miles that folks in the library community are starting to look at expressing models such as RDA and FRBR using the semantic web technology stack–including things like Dublin Core Abstract Model.

It’s exciting and timely to see the luminaries in the field get together to talk about this sort of convergence. I must admit I’m still a little bit hazy about why we need DCAM when we already have RDF, RDFS and OWL … but I think only good things can come from this interaction.

It’s particularly heartening that the library community is exploring what RDA and FRBR look like when the rubber meets the road of data representation. Although Ian Davis, Richard Newman and Bruce D’Arcus have arguably already done this for FRBR.

Update: official announcement from the British Library.


late easter present

I finally took the time to make pymarc setuptools friendly. This basically means that if you’ve got easy_install handy you can:

sudo easy_install pymarc

If you haven’t looked at eggs yet, they are pretty much the defacto standard for distributing python code. The PyPi (Python Package Index, aka Python Cheese Shop) allows easy_install to locate and download packages, which are then unpacked and installed.

pymarc was basically an experiment to make sure I understood how eggs worked with pypi. Next up Rob Sanderson has sent me some code he and a colleague did for parsing Library of Congress Classification Numbers which I’m going to bundle up as an egg as well. Stay tuned.


nekkid

Yeah, today is CSS Naked Day I just hope I remember to re-enable CSS tomorrow :-)


theory

The second book I checked out of the Library of Congress with my shiny new borrowing card was Alistair Cockburn’s Agile Software Development: The Cooperative Game (which happened to just win this years Jolt Award). Early on Cockburn recommends jumping to an appendix to read Peter Naur’s article “Programming as Theory Building” (thanks ksclarke).

This is my second time reading the article, but this time it is really resonating with me–the idea of writing programs as building theories. Partly I think this is because I was reading it while I attended a recent Haskell tutorial by coworker Adam Turoff here in DC (which I will write about shortly).

On the ride to work this morning a particular quote stood out, and I’m just writing it here so I don’t forget it:

… the problems of program modification arise from acting on the assumption that programming consists of program text production, instead of recognizing programming as an activity of theory building.

It seems obvious at first I guess. But it’s a powerful statement about what the activity of software development ought to be–instead of a string of hacks that eventually brings a piece of software to its knees.


US open access petition

As announced on the jisc-repositories list there is now a US counterpart to the EU Petition calling for Open Access.

We, the undersigned, believe that broad dissemination of research results is fundamental to the advancement of knowledge. For America’s taxpayers to obtain an optimal return on their investment in science, publicly funded research must be shared as broadly as possible. Yet too often, research results are not available to researchers, scientists, or the members of the public. Today, the Internet and digital technologies give us a powerful means of addressing this problem by removing access barriers and enabling new, expanded, and accelerated uses of research findings.

The petition was put together by the Alliance for Taxpayer’s Access in response to the 28,000 odd enlightened folks who signed the EU petition. I was encouraged to see prominent sponsor icons for American Libraries Association, Association of College & Research Libraries on the US petition.

I haven’t been tracking the Open Access movement as well as I should have–but I did take a few seconds while drinking coffee at the breakfast table this morning to sign the petition. The movement seems to be really making a lot of progress recently.

Via a bit of synchronicity Caroline Arms sent a message around at $work about the recent Emerging Libraries conference at Rice. Apparently Brewster Kahle and Paul Ginsparg had a meeting of like like minds. I guess it’s not surprising considering their roles in bringing libraries and archives into the computing age with The Internet Archive and arXiv. What is surprising is that it took this long. These two projects are wildly successful, living and breathing examples of Open Access projects.

The audio for all the conference presentations is available from Rice…including the very listenable Universal Access to Human Knowledge (Kahle) and Read as We May (Ginsparg).


oclc registry

So OCLC’s WorldCat Registry is a nice new addition to OCLCs growing list of web services. Do a search for your library and take a look at the URL: aye that’s right it’s SRU. In fact do a view source on the results page and you’ll see an SRU response in XML–the HTML is being rendered with client side XSLT.

If you drill into a particular institution you’ll see a pleasantly cool uri:

http://worldcat.org/registry/Institutions/89073

…which would serve nicely as an identifier for the Browne Popular Culture Library. The institution pages are HTML instead of XML–however there is a link to an XML representation:

http://worldcat.org/webservices/registry/content/Institutions/89073

This URL isn’t bad but it would be rather nice if the former could return XML if the Accept: header had text/xml slotted before text/html. Yeah, I did check:

  curl -I "Accept: text/xml" http://worldcat.org/registry/Institutions/89073

It’s inspiring to see OCLC going the extra mile to make their new services have web friendly machine APIs.

Update: for deeper analysis check out Pete Johnston’s WorldCat Institution Registry and Identifiers. He has some great points on the use of identifiers in the xml responses.


exhibit

If you haven’t tried Exhibit out yet the simile folks have created a truly wonderful data publishing framework which runs entirely in your browser with a bit of javascript, html and css.

The remarkable part is that it requires no backend database, but simply operates on a stream of json. If you have a couple minutes take a look at their Getting Started Tutorial which shows you how to create a exhibit of MIT related nobel laureates with a tiny bit of HTML, CSS and JavaScript.

Just as an experiment I tried pointing it at my delicious json feed for metadata. It turns out that exhibit wants json data to be a hash with a key ‘items’ that points to a list of items. In addition it also wants each item to have a ‘label’ key. I quickly reformatted the delicious json with simplejson, and got this.

A few minutes later I prodded the simile folks to see if there is a way of filtering json data on the way into exhibit so that it can be normalized…time passes (like maybe an hour) and then I hear from Johan Sundström that the latest/greatest exhibit code has this sort of filtering built in!

Tangential to the exhibit code, there has been an interesting discussion recently about how to expose exhibit content to indexing services like google. Since exhibit content is generated with pure javascript, and google (as far as we know) primarily indexes html content–the exhibit content is rendered invisible. This is a problem that digital library applications and repositories have to deal with as well, so it may be of interest.


75 minutes

The worst news so far in 2007 after the surge. Can anyone else recommend a good podcast for independent music? I’m going to suffer…


uri-templates

I’ve been playing with uri-templates a little bit at $work to help formulate clean urls for a newspaper application. The goal is to provide urls such as:

  • http://example.gov/issn/0362-4331
  • http://example.gov/issn/0362-4331/1969-05-28
  • http://example.gov/issn/0362-4331/1969-05-28/1
  • http://example.gov/issn/0362-4331/1969-05-28/1/31

I was hoping something like this would work:

  • http://example.gov/issn/{issn}/{date}/{edition}/{page}

But I’d like to indicate that the date, edition and page parameters are optional. After reading the spec and some discussion it becomes clear that there is no way to indicate that part of the path is optional. OpenSearch addresses the issue to some extent by making parameters optional with ‘?’:

  • http://example.gov/issn/{issn}/{date?}/{edition?}/{page?}

Which seems to be what I want. But there are some wrinkles such as when a page is included without a date. But perhaps these details could be application specific?

The discussion seemed to indicate that the template could be bundled with a written description of how the parameters are to be used. Or instead an additional template specification for optionality could be created which references the URI Template spec. There were also some nods towards WADL, which apparently has some richer conventions for this sort of thing.

I guess for the moment using

  • http://example.gov/issn/{issn}/{date}/{edition}/{page}

with some descriptive text will work good enough. But I think it would be useful if the uri-template draft commented on the issue somehow…since it’s bound to come up again.