oai-pmh and xmpp

As an experiment to learn more about xmpp I created a little utility that will poll an oai-pmh server and send new records as a chunk of xml over xmpp. The idea wasn’t necessarily to see all the xml coming into my jabber client (although you can do that). I wanted to enable downstream applications to have records pushed to them, instead of them having to constantly poll for updates. So you could write a client that archived away metadata and potentially articles as they are found, or write a current awareness tool that listened for articles that matched a particular users research profile, etc…

Here’s how you start it up:

oai2xmpp.py http://www.doaj.org/oai.article from@example.com to@example.org

which would poll Directory of Open Access Journals for new articles every 10 minutes, and send them via xmpp to to@example.org. You can adjust the poll interval, and limit to records within a particular set with the –pollinterval and –set options, e.g.:

oai2xmpp.py http://export.arxiv.org/oai2 currents@jabber.org ehs@jabber.org --set cs --pollinterval 86400

It’s a one file python hack in the spirit of Thom Hickey’s 2PageOAI that has a few dependencies documented in the file (lxml, xmpppy, httplib2). I’ve run it for about a week against DOAJ and arxiv.org without incident (it does respect 503 HTTP status codes telling it to slow down). You can find it here.

If you try it out, have any improvements, or ideas let me know.

One thought on “oai-pmh and xmpp

Leave a Reply