Archive for August, 2006

set your data free … with unapi

Monday, August 28th, 2006

Dan, Jeremy, Peter, Michael, Mike, Ross and I wrote an article in the latest Ariadne introducing the lightweight web protocol unAPI. Essentially unAPI is an easy way to include references to digital objects in your HTML which can then be predictably retrieved by a machine…yes ‘machine’ includes JavaScript running in a browser :-) Dan and a really nice cross section of developers around the world have been working on this spec for over a year now and I think it could be poised to play an important role in the emerging open data movement.

Imagine you have a citation database which is searchable via the web. The search results include hits. Wouldn’t it be nice to align your human viewable results with machine readable representations so that people could write browser hacks and the like to remix your application data?

As far as I can tell there are a few options available to help you do this (apart from doing something ad-hoc).

  1. use a citation microformat and mark up your HTML predictably so that it can be recognized and parsed
  2. use GRDDL to map your HTML to RDF via an XLST profile.
  3. embed RDF in your HTML essentially using an RDF microformat.
  4. OpenURL and/or COinS to link in page IDs to OpenURL servers.
  5. use unAPI and include a unapi server url (familiar autodiscovery like RSS/Atom), and identifiers (simple element attributes) and write a simple server side script that emits xml for a given identifier.

I like microformats a lot and I think a citation format will eventually get done. But it’s been a long time coming and there’s no indication it’s going to get done any time soon. What’s more unAPI is bigger than just citation data–and it allows you to publish all kinds of rich data objects without waiting for a community to ratify a particular representation in HTML.

Options 2 and 3 use RDF which I actually like quite a bit as well. GRDDL implies a GRDDL aware browser which would be cool but is a bit heavy weight. XSLT will require clean XHTML–or pipelines to clean it. Embedding RDF in HTML using microformat techniques is compelling because you can theoretically process the RDF data similarly–whereas unAPI doesn’t require any particular kind of machine readable format (apart from HTML). Actually there’s nothing stopping you from using unAPI to link human viewable objects with RDF representations. The advantage unAPI has here is you can learn RDF if you want to, but you don’t have to learn RDF to get going with unAPI today.

Option 4 leverages work done in the library community on citation linking. OpenURL routers are widely deployed in libraries around the world and COinS is a quasi-microformat for putting OpenURL context objects into your HTML so that they can be extracted and fired off at an OpenURL server. OpenURL is a relatively complex and subtle standard which can do a lot more than just citation linking. Compared to OpenURL/COinS unAPI allows for ease of implementation in languages like JavaScript and provides a simple introspection mechanism for discovering what formats a particular resource is available in. AFAIK this can’t be done simply using OpenURL/COinS. If I’m wrong, comments should be open. I would argue that the sheer power and flexibility of OpenURL paradoxically make it hard to understand…and that unAPI in Dan’s adherence to a one-page-spec is more limited and simple. Less is more…

So if this piques your interest read the article. It does a much better job of describing the origins of the work, where it’s headed, has examples and links out to sites/tools that use unAPI today. I must admit I wrote very little of the article, and mostly contributed text snippets and screenshots of the unAPI validator I wrote, which uses my unapi ruby gem.

oai2rdf

Thursday, August 24th, 2006

Amidst the flurry of commit messages and the like on the simile development discussion list I happened to see the Simile Project includes a RDFizer project which has a component called oai2rdf.

oai2rdf is a command line program that happens to use Jeff Young’s OAIHarvester2 and some XSLT magic to harvest an entire oai-pmh archive and covert it to rdf.

  % oai2rdf.sh http://cogprints.ecs.soton.ac.uk/perl/oai2 cogprints

This will harvest the entire cogprints eprint archive and convert it on the fly to rdf which is saved in a directory called cogprints. Just in case you are wondering–yes it handles resumption tokens. In fact you can also give it date ranges to harvest, and tell it to only harvest particular metadata formats. By default it actually grabs all possible metadata formats.

As part of my day job I’ve been looking at some rdf technologies like jena and while there are lots of chunks of rdf around on the web to play with oai2rdf suddenly opens up the possibilities quite a bit.

Getting oai2rdf up and running is pretty easy. First get the oai2rdf code:

  svn co http://simile.mit.edu/repository/RDFizers/oai2rdf/ oai2rdf

Next make sure you have maven. If you don’t have it maven is very easy to install. Just download, unpack, and make sure the maven/bin directory is in your path. Then you can:

  mvn package

The magic of maven will pull down dependencies and compile the code. Then you should be able to run oai2rdf. Art Rhyno has been talking about the work the Simile folks are doing for quite a while now, and only recently have I started to see what a rich set of tools they are developing.

gems…on ice

Saturday, August 12th, 2006


When developing and deploying RubyOnRails applications you’ve often got to think about the gem dependencies your project might have. It’s particularly useful to freeze a version of rails in your vendor directory so that your app uses that version of rails rather than a globally installed (or not installed) one. It’s easy to do this by simply invoking:

  rake freeze_gems

Which will unpack all the rails gems into vendor, and your application will magically use these instead of the globally installed rails gems.

The cool thing is that with a little bit of plugin help you can freeze your other gems in vendor as well. Simply install Rick Olson’s elegantly simple gem plugin into vendor/plugins. Then assuming you are using let’s say my oai-pmh gem you can simply:

  rake gems:freeze GEM=oai

and the gem will be unpacked in vendor, and the $LOAD_PATH for your application will automatically include the library path for the new gem. Very useful, thanks Rick!

the librarian’s store

Tuesday, August 1st, 2006

While working at Follett I always thought it was just a matter of time till Amazon turned it’s eye on the library market. Much of the web development that went on at Follett was done with an eye towards what Amazon was doing…while tailoring the experience for librarians and library book ordering/processing. The management I expressed this idea to seemed to think that Amazon wouldn’t be interested in Follett’s business. It was my opinion at the time that it would be better to have Amazon as a partner than a competitor. This is really just common sense right? No leap of intuition there.

…time passes…

Now it looks like (thanks eby) that Follett has some company. When a web savvy company like Amazon notices your niche in the ecosystem it’s definitely important to pay attention. Amazon has decided to partner with TLC and Marcive for MARC data and with OCLC to automatically update holdings. This is big news.

Somewhat related and even more interesting in some ways rsinger and eby report in #code4lib that they’ve seen Library of Congress Subject Headings and Dewey Decimal Classification Numbers in Amazon Web Service responses. For an example splice your Amazon Token in here:

http://webservices.amazon.com/onca/xml
?Service=AWSECommerceService
&Version=2006-06-28
&Operation=ItemLookup
&ContentType=text%2Fxml
&SubscriptionId=YOUR_TOKEN_HERE
&ItemId=097669400X
&IdType=ASIN
&ResponseGroup=ItemAttributes,Large,Subjects

scan for:

Ruby (Computer program language)