app and repositories

Pete Johnston blogged recently about a very nice use of the Atom Publishing Protocol (APP) to provide digital library repository functionality. The project is supported by UKOLN at the University of Bath and is called Simple Web-service Offering Repository Deposit (SWORD).

If you are interested in digital repositories and web services take a look at their APP profile. It’s a great example of how APP encourages the use of the Atom XML format and RESTful practices, which can then be extended to suit the particular needs of a community of practice.

To understand APP you really only need to grok a handful of concepts from the data model and REST. The data model is basically made up of a service document, which describes a set of collections, which aggregates member entries, which can in turn point to a media entry. All of these types of resources are identified with URLs. Since they are URLs you can interact with the objects with plain old HTTP–just like your web browser. For example you can list the entries in a collection by issuing a GET to the collection URL. Or you can create a member resource by doing a POST to the collection URL. Similarly you can delete a member entry by issuing a DELETE to the member entry. The full details are available in the latest draft of the RFC–and also in a wide variety of articles including this one.

So to perform a SWORD deposit a program would have to:

get the service document for the repository (GET http://www.myrepository.ac.uk/app/servicedocument
see what collections it can add objects to
create some IMS, METS or DIDL metadata to describe your repository object and ZIP it up with any of the objects datastreams
POST the zip file to the appropriate collection URL with the appropriate X-Format-Namespace to identify the format of the submitted object
check that you got a 201 Created status code and record the Location of the newly created resource
profit!

1 and 2 are perhaps not even necessary if the URL for the target collection is already known. Some notable things about the SWORD profile of APP:

two levels of conformance (one really minimalistic one)
the idea that collections imply particular treatments or workflows associated with how the object is ingested
service documents dynamically change to describe only the collections that a particular user can see
no ability to edit resources
no ability to delete resources
no ability to list collections
repository objects are POSTed as ZIP files to collections
HTTP Basic Authentication + TLS for security
the use of DublinCore to describe collections and their respective policies.
collections can support mediated deposit which means deposits can include the X-On-Behalf-Of HTTP header to identify the user to create the resource for.
the use of X-Format-Namespace HTTP header to explicitly identify the format of the submission package that is zipped up: for example IMS, METS or DIDL.

While I understand why update and delete would be disabled for deposited packages I don’t really understand why the listing of collections would be disabled. An atom feed for a collection would essentially enable harvesting of a repository, much like ListRecords in OAI-PMH.

I’m not quite sure I completely understand X-On-Behalf-Of and sword:mediation either. I could understand X-On-Behalf-Of in an environment where there is no authentication. But if a user is authenticated couldn’t their username be used to identify who is doing the deposit? Perhaps there are cases (as the doc suggests) where a deposit is done for another user?

All in all this is really wonderful work. Of particular value for me was seeing the list of SWORD extensions and also the use of HTTP status codes. If I have the time I’d like to throw together a sample repository server and client to see just how easy it is to implement SWORD. I did try some experiments along these lines for my presentation back in February…but they never got as well defined as SWORD.