oai-pmh and xmpp
As
an experiment to learn more about xmpp I
created a little utility that will poll an
oai-pmh
server and send new records as a chunk of xml over xmpp. The idea wasn’t
necessarily to see all the xml coming into my jabber client (although
you can do that). I wanted to enable downstream applications to have
records pushed to them, instead of them having to constantly poll for
updates. So you could write a client that archived away metadata and
potentially articles as they are found, or write a current awareness
tool that listened for articles that matched a particular users research
profile, etc…
Here’s how you start it up:
oai2xmpp.py http://www.doaj.org/oai.article from@example.com to@example.org
which would poll Directory of Open Access Journals for new articles every 10 minutes, and send them via xmpp to to@example.org. You can adjust the poll interval, and limit to records within a particular set with the –pollinterval and –set options, e.g.:
oai2xmpp.py http://export.arxiv.org/oai2 currents@jabber.org ehs@jabber.org --set cs --pollinterval 86400
It’s a one file python hack in the spirit of Thom Hickey’s 2PageOAI that has a few dependencies documented in the file (lxml, xmpppy, httplib2). I’ve run it for about a week against DOAJ and arxiv.org without incident (it does respect 503 HTTP status codes telling it to slow down). You can find it here.
If you try it out, have any improvements, or ideas let me know.