polling and pushing with the Times Newswire API

I’ve been continuing to play around with Node.js some. Not because it’s the only game in town, but mainly because of the renaissance that’s going in the JavaScript community, which is kind of fun and slightly addictive. Ok, I guess that makes me a fan-boy, whatever…

So the latest in my experiments is nytimestream, which is a visualization (ok, it’s just a list) of New York Times headlines using the Times Newswire API. When I saw Derek Willis recently put some work into a Ruby library for the API I got to thinking what it might be like to use Node.js and Socket.IO to provide a push stream of updates. It didn’t take too long. I actually highly doubt anyone is going to use nytimestream much. So you might be wondering why I bothered to create it at all. I guess it was kind more of an academic exercise than anything to reinforce some things that Node.js has been teaching me.

Normally if you wanted a web page to dynamically update based on events elsewhere you’d have some code running in the browser routinely poll a webservice for updates. In this scenario our clients (c1, c2 and c3) poll the Times Newswire directly:

But what happens if lots of people start using your application? Yup, you get lots of requests going to the web service…which may not be a good thing, particularly if you are limited to a certain number of requests per day.

So a logical next step is to create a proxy for the webservice, which will reduce hits on the Times Newswire API.

But still, the client code needs to poll for updates. This can result in the proxy web service needing to field lots of requests as the number of clients increases. You can poll less, but that will diminish the real time nature of your app. If you are interested in having the real time updates in your app in the first place this probably won’t seem like a great solution.

So what if you could have the proxy web service push updates to the clients when it discovers an update?

This is basically what an event-driven webservice application allows you to do (labelled NodeJS in the diagram above). Node’s Socket.IO provides a really nice abstraction around streaming updates in the browser. If you view source on nytimestream you’ll see a bit of code like this:

var socket = io.connect();
socket.on('connect', function() {
  socket.on('story', function(story) {
    addStory(story);
    removeOld();
    fadeList();
  });
});

story is a JavaScript object that comes directly from the proxy webservice as a chunk of JSON. I’ve got the app running on Heroku, which currently recommends Socket.IO be configured to only do long polling (xhr-polling). Socket.IO actually supports a bunch of other transports suitable for streaming, including web sockets. xhr-polling basically means the browser keeps a connection open to the server until an update comes down, after which it quickly reconnects to wait for the next update. This is still preferable to constant polling, especially for the NYTimes API which often sees 15 minutes or more go by without an update. Keeping connections open like this can be expensive in more typical web stacks where each connection translates into a thread or process. But this is what Node’s non-blocking IO programming environment fixes up for you.

Just because I could, I added a little easter egg view in nytimestream, which allows you to see new stories come across the wire as JSON when nytimestream discovers them. It’s similar to Twitter’s stream API in that you can call it with curl. It’s different in that, well, there’s hardly the same amount of updates. Try it out with:

curl http://nytimestream.herokuapp.com/stream/

The occasional newlines are there to prevent the connection from timing out.