Skip to content

oai-pmh and xmpp

As an experiment to learn more about xmpp I created a little utility that will poll an oai-pmh server and send new records as a chunk of xml over xmpp. The idea wasn’t necessarily to see all the xml coming into my jabber client (although you can do that). I wanted to enable downstream applications to have records pushed to them, instead of them having to constantly poll for updates. So you could write a client that archived away metadata and potentially articles as they are found, or write a current awareness tool that listened for articles that matched a particular users research profile, etc…

Here’s how you start it up:

oai2xmpp.py http://www.doaj.org/oai.article from@example.com to@example.org

which would poll Directory of Open Access Journals for new articles every 10 minutes, and send them via xmpp to to@example.org. You can adjust the poll interval, and limit to records within a particular set with the –pollinterval and –set options, e.g.:

oai2xmpp.py http://export.arxiv.org/oai2 currents@jabber.org ehs@jabber.org --set cs --pollinterval 86400

It’s a one file python hack in the spirit of Thom Hickey’s 2PageOAI that has a few dependencies documented in the file (lxml, xmpppy, httplib2). I’ve run it for about a week against DOAJ and arxiv.org without incident (it does respect 503 HTTP status codes telling it to slow down). You can find it here, which is actually a bazaar repository that you can branch if you like:

bzr branch http://inkdroid.org/bzr/currents

If you try it out, have any improvements, or ideas let me know–or better yet bzr send me a patch. There’s a short TODO section in the file’s documentation that describes some potential things that could be done.

One Comment

  1. Thom wrote:

    Nice Ed, but all that white space in the Python code makes me feel woozy.

    –Th

    Friday, September 25, 2009 at 11:46 am | Permalink

Post a Comment

You must be logged in to post a comment.