repositories and domain-specific-languages

At work I’ve been doing some experiments with the Fedora repository software. One of the strengths of Fedora is that it is fundamentally designed as a set of extensible web services. At first I set about becoming familiar with the set of web services and decided that Ruby would be a useful and lightweight language to do this from. Sure enough, Ruby was plenty capable of putting stuff into Fedora and getting stuff back out again.

As time went on it became clear that what was really needed was a layer of abstraction around this Fedora web services API that would allow it (or another repository framework) to be used in a programatic way without having to make SOAP calls and building FOXML all over the place. Typically in software pattern lingo this is referred to as a facade.

So I worked on creating a facade, and ended up with something I half-jokingly called ‘bitbucket’ which looks something like this:

  require 'bitbucket'

  # create a repository
  repository = BitBucket::Repository.new

  # create a new repository object
  o = BitBucket::RepositoryObject.new
  o.dc.title = 'Automatic for the People'
  o.dc.creator = 'REM'

  # add a datastream to the object
  o << BitBucket::DataStream.new_from_file('The Sidewinder Sleeps Tonight.mp3')

  # ingest it!
  id = repository.ingest(o)

Now this code is pretty basic: it creates an object for a CD, associates an mp3 with it, and then adds it to a repository. This is the typical ‘ingest’ process but notice that the ingest format, the SOAP requests, mime-types, and the actual type of the repository are unspecified. The truth is even more could be hidden such as the Dublin Core. Some things could use better names: ‘Resource’ instead of ‘RepostoryObject’ perhaps. If you have interest in using this code (yes it works!) let me know–I imagine it could be liberated from a private subversion repository.

Just after finishing this up it struck me that while I was trying to build a facade around Fedora I was at the same time striving for a domain specific language for repositories.

The basic idea of a domain specific language (DSL) is a computer language that’s targeted to a particular kind of problem, rather than a general purpose language that’s aimed at any kind of software problem.

As Martin Fowler goes on to describe there are two different types of DSLs: external and internal. External DSLs are custom languages such as regular expressions, postscript, ant configuration files, etc. Typically a syntax for the mini-language is determined and a small (hopefully) interpreter is written which parses and processes the DSL. Internal DSLs on the other hand use the constructs of a host programming language to define the DSL. There is a strong tradition of using DSLs in Lisp and Smalltalk…and it seems to also be a growing tradition in the Ruby community as well.

So a DSL for repositories would provide a mini-lanugage, if you will, for interacting with a repository. I think that the efforts underway to build models for interoperability across scholarly repositories are in a way groping after this same thing–an unambiguous language for interacting with repositories.

The Pathways Core poster session at JCDL was very exciting. While the ideas were compelling enough, Jeroen Bekaert created some absolutely beautiful diagrams which really sold some of the concepts. I wish I could find some to put here. I got a chance to pick Xiaoming Liu’s brain a lot at the conference and over beers and I am really looking forward to their upcoming papers on this topic.

What I’d like to see is how easy it would be to use this emerging pathways model to create a Ruby DSL that uses the Atom Publishing Protocol as a backend. I’d also like to take a look at JSR 170. My main purpose in this is to see how well the aims of the scholarly community map to the content management solutions being developed outside the digital library community.