Archive for July, 2006

more on web identifiers

Tuesday, July 11th, 2006

I monitor the www-tag discussion list, but more than half of it goes right over my head–so I was pleased when a colleague forwarded URNs, Namespaces and Registries to me. Don’t let the 2001 in the URL fool you, it has been updated quite recently. This finding provides an interesting counterpoint to rfc 4452 which I wrote about earlier.

Essentially the authors go about examining the reasons why folks want to have URNs (persistence) and info-uris (non-dereferencability) and showing how URIs actually satisfy the requirements of these two communities.

I have to admit, it sure would be nice if (for example) LCCNs and OCLCNUMs resolved using the existing the infrastructure of http and dns. Let’s say I run across an info-uri in a XML document identifying tbl as info:lccn/no9910609. What does that really tell me? Wouldn’t it be nice if instead it was http://lccn.info/no9910609 and I could use my net/http library of choice to fetch tbl’s MADS record? Amusingly Henry Thompson (one of the authors of the finding) is holding http://lccn.info and http://oclcnum.info for ransom :-)

Instead, in the case of info-uri, OCLC is tasked with building a registry of these namespaces, and even when this is built the identifiers won’t necessarily be resolvable in any fashion. This is the intent behind info-uris of course–that they need not be resolvable or persistent. But this finding raises some practical issues that are worth taking a look at, which seem to point to the ultimate power of the web-we-already-have.

iraq

Monday, July 10th, 2006

I saw this in yesterday’s Washington Post and just learned it won the Best of Photo Journalism Award for 2006. The picture says it all, but the story is just as harrowing. What a sad mess.

repositories and domain-specific-languages

Thursday, July 6th, 2006

At work I’ve been doing some experiments with the Fedora repository software. One of the strengths of Fedora is that it is fundamentally designed as a set of extensible web services. At first I set about becoming familiar with the set of web services and decided that Ruby would be a useful and lightweight language to do this from. Sure enough, Ruby was plenty capable of putting stuff into Fedora and getting stuff back out again.

As time went on it became clear that what was really needed was a layer of abstraction around this Fedora web services API that would allow it (or another repository framework) to be used in a programatic way without having to make SOAP calls and building FOXML all over the place. Typically in software pattern lingo this is referred to as a facade.

So I worked on creating a facade, and ended up with something I half-jokingly called ‘bitbucket’ which looks something like this:

  require 'bitbucket'
 
  # create a repository
  repository = BitBucket::Repository.new
 
  # create a new repository object
  o = BitBucket::RepositoryObject.new
  o.dc.title = 'Automatic for the People'
  o.dc.creator = 'REM'
 
  # add a datastream to the object
  o << BitBucket::DataStream.new_from_file('The Sidewinder Sleeps Tonight.mp3')
 
  # ingest it!
  id = repository.ingest(o)

Now this code is pretty basic: it creates an object for a CD, associates an mp3 with it, and then adds it to a repository. This is the typical ‘ingest’ process but notice that the ingest format, the SOAP requests, mime-types, and the actual type of the repository are unspecified. The truth is even more could be hidden such as the Dublin Core. Some things could use better names: ‘Resource’ instead of ‘RepostoryObject’ perhaps. If you have interest in using this code (yes it works!) let me know–I imagine it could be liberated from a private subversion repository.

Just after finishing this up it struck me that while I was trying to build a facade around Fedora I was at the same time striving for a domain specific language for repositories.

The basic idea of a domain specific language (DSL) is a computer language that’s targeted to a particular kind of problem, rather than a general purpose language that’s aimed at any kind of software problem.

As Martin Fowler goes on to describe there are two different types of DSLs: external and internal. External DSLs are custom languages such as regular expressions, postscript, ant configuration files, etc. Typically a syntax for the mini-language is determined and a small (hopefully) interpreter is written which parses and processes the DSL. Internal DSLs on the other hand use the constructs of a host programming language to define the DSL. There is a strong tradition of using DSLs in Lisp and Smalltalk…and it seems to also be a growing tradition in the Ruby community as well.

So a DSL for repositories would provide a mini-lanugage, if you will, for interacting with a repository. I think that the efforts underway to build models for interoperability across scholarly repositories are in a way groping after this same thing–an unambiguous language for interacting with repositories.

The Pathways Core poster session at JCDL was very exciting. While the ideas were compelling enough, Jeroen Bekaert created some absolutely beautiful diagrams which really sold some of the concepts. I wish I could find some to put here. I got a chance to pick Xiaoming Liu’s brain a lot at the conference and over beers and I am really looking forward to their upcoming papers on this topic.

What I’d like to see is how easy it would be to use this emerging pathways model to create a Ruby DSL that uses the Atom Publishing Protocol as a backend. I’d also like to take a look at JSR 170. My main purpose in this is to see how well the aims of the scholarly community map to the content management solutions being developed outside the digital library community.

code4libcon 2007

Sunday, July 2nd, 2006

Here’s to making sure that code4libcon 2007 is a watershed moment for women library technologists.

code4libcon 2006 in Corvallis wasn’t all male, but it was largely…and I can only remember two women speaking to the audience. To a large extent code4libcon was modeled after technology conferences like yapc, pycon, oscon, barcamp, etc–which have much the same sort of ratio. But libraries are different because the majority of people who work in libraries are women. So it was a bit surprising that more women didn’t end up at code4libcon 2006.

2006 did get organized practically overnight with a very small (male) clique in an irc room (that’s not always well behaved, but mean well–hey it’s IRC). When people actually started signing up and sending in papers to the more formal discussion list I think we were all kind of surprised. I seriously thought we were just going to be hanging out in some random space with free wifi, and it turned into this really successful event.

Some folks like Dan Chudnov, Art Rhyno, Jeremy Frumkin and Roy Tennant started thinking and talking early about making the conference appeal to women library technologists. But it seems that either the voting (open to all, but all men for some reason) somehow subconsciously counteracted this.

AFAIK the keynote voting is still going on, and I imagine you can still suggest speakers. There will only be more voting to do as we get into selecting presenters. If you’d like to participate just email Brad LaJeunesse and he’ll hook you up with a backpack login. Also, sign up for the code4lib and code4libcon discussion lists. Luckily Dorothea Salo is involved and vocal and I’m hoping that other women technologists will get involved too. This is a grassroots thing after all, not some sort of LITA top-tech trends panel. It’ll become whatever we want it to be.