NoDB

Last week Liam Wyatt emailed me asking if I could add The National Museum of Australia to Linkypedia, which tracks external links from Wikipedia articles to specific websites. Specifically Liam was interested in seeing the list of articles that reference the National Museum, sorted by how much they are viewed at Wikipedia. This presented me with two problems:

  1. I turned Linkypedia off a few months ago, since the site hadn’t been seeing much traffic, and I have not yet figured out how to keep the site going on the paltry Linode VPS I’m using for other things like this blog.
  2. I hadn’t incorporated Wikipedia page view statistics into Linkypedia, because I didn’t know they were available, and even if I had I didn’t have Liam’s idea of using them in this way.


2011 Musics

2011 was the year of streaming music for me–specifically using Rdio. Being able to follow what friends and folks I admire are listening to, easily listen along, and then build my own online collection in the cloud was a revelation. Being able to easily do it from my desktop at home, or at work, or on my mobile device for $5/month was just astounding. The world ain’t perfect, but this is damn near close.

Anyhow, here’s some of my favorite music from 2011, in no particular order … a lot of which I probably wouldn’t have listened to if it wasn’t streamable on the Web. You might have to wait a few seconds while the YouTube clips load.


genealogy of a typo

I got a Kindle Touch today for Christmas–thanks Kesa! Admittedly I’m pretty late to this party. As I made ready to purchase my first ebook I hopped over to my GoodReads to-read list, to pick something out. I scanned the list quickly, and my eye came to rest on Stephen Ramsey’s recent book Reading Machines. But I got hung up on something irrelevant: the subtitle was Toward and Algorithmic Criticism instead of Toward an Algorithmic Criticism, the latter of which is clearly correct based on the cover image.

Having recently looked at API services for book data I got curious about how the title appeared on other popular web properties, such as Amazon:

GoogleBooks:

Barnes & Noble:

and LibraryThing

I wasn’t terribly surprised not to find it on OpenLibrary. But it does seem interesting that the exact same typo is present on all these book websites as well, while the title appears correct on the publisher’s website:

and at OCLC:

It’s hard to tell for sure, but my guess is that Amazon, Barnes & Noble, and GoogleBooks got the error from Bowker Link (the Books in Print data service), and that LibraryThing then picked up the data from Amazon, and similarly GoodReads picked up the data from GoogleBooks. LibraryThing can pull data from a variety of sources, including Amazon; and I’m not entirely sure where GoodReads gets their data from, but it seems likely that it comes from the GoogleBooks API given other tie-ins with Google.

If you know more about the lineage of data in these services I would be interested to hear it. Specifically if you have a subscription to BowkerLink it would be great if you could check the title. It would be nice to live in a world where these sorts of data provenance issues were easier to read.


polling and pushing with the Times Newswire API

I’ve been continuing to play around with Node.js some. Not because it’s the only game in town, but mainly because of the renaissance that’s going in the JavaScript community, which is kind of fun and slightly addictive. Ok, I guess that makes me a fan-boy, whatever…

So the latest in my experiments is nytimestream, which is a visualization (ok, it’s just a list) of New York Times headlines using the Times Newswire API. When I saw Derek Willis recently put some work into a Ruby library for the API I got to thinking what it might be like to use Node.js and Socket.IO to provide a push stream of updates. It didn’t take too long. I actually highly doubt anyone is going to use nytimestream much. So you might be wondering why I bothered to create it at all. I guess it was kind more of an academic exercise than anything to reinforce some things that Node.js has been teaching me.

Normally if you wanted a web page to dynamically update based on events elsewhere you’d have some code running in the browser routinely poll a webservice for updates. In this scenario our clients (c1, c2 and c3) poll the Times Newswire directly:

But what happens if lots of people start using your application? Yup, you get lots of requests going to the web service…which may not be a good thing, particularly if you are limited to a certain number of requests per day.

So a logical next step is to create a proxy for the webservice, which will reduce hits on the Times Newswire API.

But still, the client code needs to poll for updates. This can result in the proxy web service needing to field lots of requests as the number of clients increases. You can poll less, but that will diminish the real time nature of your app. If you are interested in having the real time updates in your app in the first place this probably won’t seem like a great solution.

So what if you could have the proxy web service push updates to the clients when it discovers an update?

This is basically what an event-driven webservice application allows you to do (labelled NodeJS in the diagram above). Node’s Socket.IO provides a really nice abstraction around streaming updates in the browser. If you view source on nytimestream you’ll see a bit of code like this:

var socket = io.connect();
socket.on('connect', function() {
  socket.on('story', function(story) {
    addStory(story);
    removeOld();
    fadeList();
  });
});

story is a JavaScript object that comes directly from the proxy webservice as a chunk of JSON. I’ve got the app running on Heroku, which currently recommends Socket.IO be configured to only do long polling (xhr-polling). Socket.IO actually supports a bunch of other transports suitable for streaming, including web sockets. xhr-polling basically means the browser keeps a connection open to the server until an update comes down, after which it quickly reconnects to wait for the next update. This is still preferable to constant polling, especially for the NYTimes API which often sees 15 minutes or more go by without an update. Keeping connections open like this can be expensive in more typical web stacks where each connection translates into a thread or process. But this is what Node’s non-blocking IO programming environment fixes up for you.

Just because I could, I added a little easter egg view in nytimestream, which allows you to see new stories come across the wire as JSON when nytimestream discovers them. It’s similar to Twitter’s stream API in that you can call it with curl. It’s different in that, well, there’s hardly the same amount of updates. Try it out with:

curl http://nytimestream.herokuapp.com/stream/

The occasional newlines are there to prevent the connection from timing out.


fascinating, hypnotic, inspirational, appalling, irrelevant

Thanks to everyone that noticed the Wikistream coverage in the NextWeb article and elsewhere. If you happen to have tweeted about Wikistream in the last 2 days you should see your avatar to the left. Click on it to make it bigger. I’m in there somewhere too :-)

Before Sunday the site hadn’t seen more than 180 unique visitors per day, and on Monday it saw almost 30,000. The site is kind of different since it streams all the Wikipedia updates to the browser as JSON, where it is then displayed. I had some nail-biting moments as I watched Node frequently streaming up to 300 concurrent connections. It was a wild ride for my little Linode VPS with 512MB of RAM, where Wordpress and a Django website were also running…but it seemed to weather the storm OK. Mostly I think I could have used more RAM during peak usage when Node and Redis were wanting enough memory to cause the system to swap. Thanks to Gabe and Chris for helping me get the cache headers set right in Express.

I thought briefly about upgrading to a larger Linode instance, putting the app on EC2, or maybe asking Wikimedia if they wanted to host it. But Wikistream is really more a piece of performance art than it is a useful website. I’m expecting people that have looked at Wikistream once will have seen how much Wikipedia is actively edited, and not feel compelled to look at it again. After a few days I expect the usage to plummet and it can go back to running comfortably on my little Linode VPS to serve as a live prop in presentations about Wikipedia, crowd-sourcing, Web culture, etc.

One of my favorite mentions of Wikistream came from Nat Torkington’s Four Short Links on O’Reilly Radar. Nat described Wikistream as

fascinating and hypnotic and inspirational and appalling and irrelevant all at once

I took this as high-praise of course. I could only get the last two days out of Twitter’s search API, which misses the day that the NextWeb article appeared, followed by it getting picked up on Hacker News. But it was 226 tweets, and provided for a fun little data set to look at. I wrote a little script to look for URLs in the tweets, unshorten them and come up with a list of web pages that mentioned Wikistream in the past few days. One thing that was really interesting to me was the predominance of non-English websites. Here’s a list of some of them if you are interested.

OK, I’m finished with the narcissistic navel gazing for a bit. But seriously, thanks for the attention. I’ve never experienced anything like that before. Wow.


thanks Wikipedia

Support Wikipedia
The field of software development, the Web and libraries is changing so fast that there is no way to know everything I need to know to do my job well. Wikipedia continues to be an essential resource for learning about technologies, algorithms, people, and history related to my work. It’s hard to imagine what the world would be like without it. Thanks for another year of awesome Wikipedia! The check is in the mail; well OK it’s actually coming from PayPal … you know the drill.


Visualizing FRBR Worksets


The model behind the Functional Requirements for Bibliographic Records (FRBR) was published over 10 years ago, and has been simmering in library land ever since. Bit by bit, FRBR has been finding its way into library systems and software, sometimes in a slightly modified form. But it has been slow going because FRBR offers a more nuanced view of bibliographic data than what is available in our legacy MARC data. So the FRBR relationships we want largely have to be teased out of the data we have.


One of the primary things that FRBR offers is the notion of a Work that groups together Expressions and Manifestations. For example, William Gibson wrote a book Neuromancer, which has been translated into many languages, and is available from multiple publishers. Collectors are sometimes interested in specific editions of a book, say a first edition printing; but readers are often interested in any edition of a work, because they don’t particularly care what’s on the cover, or what pagination or typeface is used. FRBR provides a conceptual model for working with books in this way. For the software developer FRBR also holds out the promise of a normalized view of book data, where some things, such as the author and subject of the book can be expressed in one place (as attributes of the Work) rather than repeated for all the Expressions and Manifestations.

If you are a bibliographic data aficionado, you are probably already familiar with FRBR-ization Web services like xISBN and ThingISBN that make it possible to determine other related editions, or the workset, for a given ISBN. So to look up the 1995 Ace Books printing of Neuromancer (0441569595) at xISBN you can GET a URL like http://xisbn.worldcat.org/webservices/xid/isbn/0441569595?method=getEditions&format=xml and get back some XML like:

<?xml version="1.0" encoding="UTF-8"?>
<rsp xmlns="http://worldcat.org/xid/isbn/" stat="ok">
  <isbn>0441569595</isbn>
  <isbn>0441569579</isbn>
  <isbn>0441012035</isbn>
  <isbn>0006480411</isbn>
  <isbn>1570420599</isbn>
  <isbn>0007119585</isbn>
  <isbn>0736638369</isbn>
  <isbn>0441569587</isbn>
  <isbn>1570421560</isbn>
  <isbn>9029042478</isbn>
  <isbn>229000619X</isbn>
  <isbn>415010672X</isbn>
  <isbn>0307969940</isbn>
  <isbn>0441569560</isbn>
  <isbn>569700124X</isbn>
  <isbn>5792101205</isbn>
  <isbn>2707115622</isbn>
  <isbn>7542818732</isbn>
  <isbn>229030820X</isbn>
  <isbn>2744139157</isbn>
  <isbn>0932096417</isbn>
  <isbn>3453313895</isbn>
  <isbn>1616577843</isbn>
  <isbn>9607002504</isbn>
  <isbn>8445072897</isbn>
  <isbn>0002252325</isbn>
  <isbn>8842907464</isbn>
  <isbn>9029049367</isbn>
  <isbn>8445075950</isbn>
  <isbn>9029050748</isbn>
  <isbn>8071930482</isbn>
  <isbn>0586066454</isbn>
  <isbn>7542824139</isbn>
  <isbn>9119027818</isbn>
  <isbn>8085601273</isbn>
  <isbn>0441000681</isbn>
  <isbn>8445070843</isbn>
  <isbn>8385784012</isbn>
  <isbn>8982738851</isbn>
  <isbn>3893111387</isbn>
  <isbn>807193318X</isbn>
  <isbn>5170198892</isbn>
  <isbn>8371500432</isbn>
  <isbn>8467426373</isbn>
  <isbn>0441007465</isbn>
  <isbn>057503470X</isbn>
  <isbn>8585887907</isbn>
  <isbn>3893111379</isbn>
  <isbn>911300347X</isbn>
  <isbn>8422672596</isbn>
  <isbn>9118721826</isbn>
  <isbn>3453056655</isbn>
  <isbn>3807703098</isbn>
  <isbn>8390021439</isbn>
  <isbn>8203203329</isbn>
  <isbn>8789586735</isbn>
  <isbn>8485752414</isbn>
  <isbn>9612310203</isbn>
  <isbn>8445074059</isbn>
  <isbn>8445076620</isbn>
  <isbn>8974271419</isbn>
  <isbn>3453403851</isbn>
  <isbn>9510172049</isbn>
  <isbn>8758804110</isbn>
  <isbn>9510193062</isbn>
  <isbn>2277223255</isbn>
  <isbn>9637632050</isbn>
  <isbn>9755760326</isbn>
  <isbn>3898132595</isbn>
  <isbn>8790136292</isbn>
  <isbn>8804516445</isbn>
  <isbn>8842910686</isbn>
</rsp>

LibraryThing has a similar API call which allows you to splice the ISBN into a URL like so http://www.librarything.com/api/thingISBN/0441569595, and get:

<?xml version="1.0" encoding="utf-8"?>
<idlist>
  <isbn>0441569595</isbn>
  <isbn>0441012035</isbn>
  <isbn>0006480411</isbn>
  <isbn>0586066454</isbn>
  <isbn>0441007465</isbn>
  <isbn>0441000681</isbn>
  <isbn>8585887907</isbn>
  <isbn>0002252325</isbn>
  <isbn>0441569560</isbn>
  <isbn>3453056655</isbn>
  <isbn>0441569579</isbn>
  <isbn>0932096417</isbn>
  <isbn>0441569587</isbn>
  <isbn>057503470X</isbn>
  <isbn>229030820X</isbn>
  <isbn>8445070843</isbn>
  <isbn>2277223255</isbn>
  <isbn>3453313895</isbn>
  <isbn>8804516445</isbn>
  <isbn>9510193062</isbn>
  <isbn>0007119585</isbn>
  <isbn>8445075950</isbn>
  <isbn>9119027818</isbn>
  <isbn>9510172049</isbn>
  <isbn>8842907464</isbn>
  <isbn>1570420599</isbn>
  <isbn>9637632050</isbn>
  <isbn>9029042478</isbn>
  <isbn>415010672X</isbn>
  <isbn>9634970982</isbn>
  <isbn>8085601273</isbn>
  <isbn>0613922514</isbn>
  <isbn>2707115622</isbn>
  <isbn>8445074059</isbn>
  <isbn>8842913529</isbn>
  <isbn>1569564116</isbn>
  <isbn>9118721826</isbn>
  <isbn>8842910686</isbn>
  <isbn>3898132595</isbn>
  <isbn>1570421560</isbn>
  <isbn>229000619X</isbn>
  <isbn>3893111387</isbn>
  <isbn>8071930482</isbn>
  <isbn>2744139157</isbn>
  <isbn>8445072897</isbn>
  <isbn>8371500432</isbn>
  <isbn>8576570491</isbn>
  <isbn>8789586735</isbn>
  <isbn>9639238023</isbn>
  <isbn>3453074203</isbn>
  <isbn>3893111379</isbn>
  <isbn>0307969940</isbn>
  <isbn>8203203329</isbn>
  <isbn>8842906808</isbn>
  <isbn>9752103677</isbn>
  <isbn>0736638369</isbn>
  <isbn>8324577750</isbn>
  <isbn>8790136292</isbn>
  <isbn>8778803438</isbn>
  <isbn>807193318X</isbn>
</idlist>

I don’t actually know the mechanics of ThingISBN and xISBN in detail, but it’s my understanding that xISBN uses an algorithm to unify works, whereas LibraryThing relies on people to connect things up.

A newer player in this space is the OpenLibrary API. Instead of providing an ISBN -> ISBNs function, OpenLibrary make the editions for a given Work available using a URL like http://openlibrary.org/works//works/OL27258W/editions.json?limit=50&offset=0. This requires you to know the OpenLibrary Work identifier (e.g. OL27258W). Fortunately you can look up their Work identifier using another REST call
using the ISBN: http://openlibrary.org/api/books?bibkeys=ISBN:0441569595&jscmd=details&format=json. The OpenLibrary response includes a lot more information than the LibraryThing or xISBN results, which is way you are required to page through the results with the API, rather than getting all the results back at once:

{
  "size": 19, 
  "links": {
    "self": "/works/OL27258W/editions.json?limit=50&offset=0", 
    "work": "/works/OL27258W"
  }, 
  "entries": [
    {
      "number_of_pages": 322, 
      "subtitle": "roman", 
      "series": [
        "Cyberspace trilogien", 
        "Gibsons Cyberspace trilogi -- 1"
      ], 
      "latest_revision": 3, 
      "edition_name": "2. udg./1. opl.", 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:1618994437:773"
      ], 
      "title": "Neuromantiker", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/dan"
        }
      ], 
      "publish_country": "dk ", 
      "by_statement": "William Gibson ; p\u00e5 dansk ved Arne Herl\u00f8v Petersen.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 3, 
      "publishers": [
        "Per Kof"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-18T09:06:00.229423"
      }, 
      "key": "/books/OL17987798M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "Copenhagen"
      ], 
      "pagination": "322 p.", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-10-08T22:54:50.763681"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer."
      }, 
      "identifiers": {
        "librarything": [
          "609"
        ]
      }, 
      "isbn_10": [
        "8790136292"
      ], 
      "publish_date": "1995", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "14771"
        ]
      }, 
      "subject_place": [
        "Japan"
      ], 
      "lc_classifications": [
        "PR9199.3.G514 N4x 1986"
      ], 
      "latest_revision": 4, 
      "edition_name": "1st Phantasia Press ed.", 
      "genres": [
        "Fiction."
      ], 
      "source_records": [
        "marc:marc_records_scriblio_net/part20.dat:107059645:825"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "subjects": [
        "Computer hackers -- Fiction", 
        "Business intelligence -- Fiction", 
        "Information superhighway -- Fiction", 
        "Nervous system -- Wounds and injuries -- Fiction", 
        "Conspiracies -- Fiction", 
        "Japan -- Fiction"
      ], 
      "publish_country": "miu", 
      "by_statement": "William Gibson.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "Phantasia Press"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-07-31T08:19:43.878905"
      }, 
      "key": "/books/OL2154100M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "West Bloomfield, Mich"
      ], 
      "pagination": "vi, 231 p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-04-01T03:28:50.625462"
      }, 
      "lccn": [
        "88672297"
      ], 
      "number_of_pages": 231, 
      "isbn_10": [
        "0932096417"
      ], 
      "publish_date": "1986", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "826097"
        ]
      }, 
      "latest_revision": 4, 
      "source_records": [
        "marc:talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:1128979384:559"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "publish_country": "xxk", 
      "by_statement": "William Gibson.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "Voyager"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-19T09:28:46.010665"
      }, 
      "key": "/books/OL22822383M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "London"
      ], 
      "pagination": "317p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2009-01-04T10:04:32.718474"
      }, 
      "dewey_decimal_class": [
        "813.54"
      ], 
      "notes": {
        "type": "/type/text", 
        "value": "Originally published: [London]: Gollancz; 1984."
      }, 
      "number_of_pages": 317, 
      "isbn_10": [
        "0006480411"
      ], 
      "publish_date": "1995", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "number_of_pages": 316, 
      "latest_revision": 4, 
      "edition_name": "1a ed. en bolsillo.", 
      "source_records": [
        "marc:SanFranPL10/SanFranPL10.out:61656066:1111"
      ], 
      "title": "Neuromante", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/spa"
        }
      ], 
      "subjects": [
        "Ciencia-ficci\u00f3n"
      ], 
      "publish_country": "sp ", 
      "by_statement": "William Gibson ; [traducci\u00f3n de Jos\u00e9 Arconada Rodr\u00edguez y Javier Ferreira Ramos].", 
      "oclc_numbers": [
        "50083763"
      ], 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "Minotauro"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-19T10:44:18.483562"
      }, 
      "key": "/books/OL23054075M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "Barcelona"
      ], 
      "pagination": "316 p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2009-02-18T07:02:41.481991"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer.\n\nPremio Hugo.\n\nPremio Nebula.\n\nPremio Philip K. Dick."
      }, 
      "identifiers": {
        "librarything": [
          "609"
        ]
      }, 
      "isbn_10": [
        "8445072897"
      ], 
      "publish_date": "1997", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "1163291"
        ]
      }, 
      "latest_revision": 4, 
      "source_records": [
        "marc:talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:1449506617:614"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "publish_country": "enk", 
      "by_statement": "William Gibson.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "HarperCollins"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-19T09:38:30.187012"
      }, 
      "key": "/books/OL22849249M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "London"
      ], 
      "pagination": "277p.", 
      "created": {
        "type": "/type/datetime", 
        "value": "2009-01-07T20:05:13.391858"
      }, 
      "dewey_decimal_class": [
        "813.54"
      ], 
      "notes": {
        "type": "/type/text", 
        "value": "Originally published in Great Britain by Gollancz, 1984."
      }, 
      "number_of_pages": 277, 
      "isbn_10": [
        "0002252325"
      ], 
      "publish_date": "1994", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "publishers": [
        "Harper Collins"
      ], 
      "pagination": "317p. ;", 
      "source_records": [
        "marc:talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2979159053:556"
      ], 
      "title": "Neuromancer", 
      "dewey_decimal_class": [
        "813/.54"
      ], 
      "notes": {
        "type": "/type/text", 
        "value": "Originally published, London , Gollancz, 1984."
      }, 
      "number_of_pages": 317, 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-10-25T02:27:53.587823"
      }, 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-10-15T15:26:45.512262"
      }, 
      "latest_revision": 3, 
      "publish_country": "xxk", 
      "key": "/books/OL19969875M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_date": "1993", 
      "publish_places": [
        "London"
      ], 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ], 
      "type": {
        "key": "/type/edition"
      }, 
      "by_statement": "William Gibson.", 
      "revision": 3
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "2292560"
        ]
      }, 
      "subtitle": "Science Fiction Roman", 
      "series": [
        "Heyne science fiction & fantasy -- Bd. 06/4400"
      ], 
      "latest_revision": 4, 
      "edition_name": "3. Aufl.", 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:716896905:827"
      ], 
      "title": "Neuromancer", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/ger"
        }
      ], 
      "publish_country": "gw ", 
      "by_statement": "William Gibson ; Deutsche \u00dcbersetzund von Reinhard Heinz.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "W. Heyne"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-18T03:53:57.235299"
      }, 
      "key": "/books/OL16064340M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "M\u00fcnchen"
      ], 
      "pagination": "363 p. :", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-09-22T02:36:53.194997"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "\"Deutsche Erstver\u00f6ffentlichung.\"\n\nTranslation of: Neuromancer."
      }, 
      "number_of_pages": 363, 
      "isbn_10": [
        "3453313895"
      ], 
      "publish_date": "1989", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "number_of_pages": 371, 
      "subject_place": [
        "Japan"
      ], 
      "covers": [
        284192
      ], 
      "lc_classifications": [
        "PS3557.I2264 N48 2004"
      ], 
      "latest_revision": 6, 
      "edition_name": "20th anniversary ed.", 
      "genres": [
        "Fiction."
      ], 
      "source_records": [
        "marc:marc_records_scriblio_net/part15.dat:26112823:924", 
        "marc:marc_loc_updates/v35.i20.records.utf8:16403653:1145"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "subjects": [
        "Computer hackers -- Fiction", 
        "Business intelligence -- Fiction", 
        "Information superhighway -- Fiction", 
        "Nervous system -- Wounds and injuries -- Fiction", 
        "Conspiracies -- Fiction", 
        "Japan -- Fiction"
      ], 
      "publish_country": "nyu", 
      "by_statement": "William Gibson ; with a new introduction by the author ; with an afterword by Jack Womack.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 6, 
      "publishers": [
        "Ace Books"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-07-31T14:51:42.931650"
      }, 
      "key": "/books/OL3305354M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "New York"
      ], 
      "pagination": "xi, 371 p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-04-01T03:28:50.625462"
      }, 
      "dewey_decimal_class": [
        "813/.54"
      ], 
      "identifiers": {
        "goodreads": [
          "14770"
        ], 
        "librarything": [
          "609"
        ]
      }, 
      "lccn": [
        "2004048718"
      ], 
      "isbn_10": [
        "0441012035"
      ], 
      "publish_date": "2004", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "888628"
        ]
      }, 
      "subject_place": [
        "Japan"
      ], 
      "covers": [
        283860
      ], 
      "lc_classifications": [
        "PS3557.I2264 N48 2000"
      ], 
      "latest_revision": 5, 
      "edition_name": "Ace trade ed.", 
      "genres": [
        "Fiction."
      ], 
      "source_records": [
        "marc:marc_records_scriblio_net/part13.dat:153635745:885"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "subjects": [
        "Computer hackers -- Fiction", 
        "Business intelligence -- Fiction", 
        "Information superhighway -- Fiction", 
        "Nervous system -- Wounds and injuries -- Fiction", 
        "Conspiracies -- Fiction", 
        "Japan -- Fiction"
      ], 
      "publish_country": "nyu", 
      "series": [
        "Ace science fiction"
      ], 
      "by_statement": "William Gibson ; with an afterword by Jack Womack.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 5, 
      "publishers": [
        "Ace Books"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-03T20:25:35.114363"
      }, 
      "key": "/books/OL3963678M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "New York"
      ], 
      "pagination": "276 p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-04-01T03:28:50.625462"
      }, 
      "dewey_decimal_class": [
        "813/.54"
      ], 
      "number_of_pages": 276, 
      "lccn": [
        "2001268016"
      ], 
      "isbn_10": [
        "0441007465"
      ], 
      "publish_date": "2000", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "122395"
        ]
      }, 
      "latest_revision": 4, 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:219836701:673"
      ], 
      "title": "Neuromancien", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/fre"
        }
      ], 
      "publish_country": "fr ", 
      "by_statement": "William Gibson ; traduit de l'am\u00e9ricain par Jean Bonnefoy.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "\u00c9ditions J'ai lu"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-18T21:33:39.583788"
      }, 
      "key": "/books/OL21395048M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "Paris"
      ], 
      "pagination": "318 p.", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-11-02T11:15:35.318748"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer."
      }, 
      "number_of_pages": 318, 
      "isbn_10": [
        "2277223255"
      ], 
      "publish_date": "1988", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "subtitle": "en sp\u00e6ndingsroman", 
      "latest_revision": 3, 
      "contributions": [
        "Mortensen, Hans Palle"
      ], 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:848159064:705"
      ], 
      "title": "Neuromantiker", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/dan"
        }
      ], 
      "publish_country": "de ", 
      "by_statement": "William Gibson ; p\u00e5 dansk ved Hans Palle Mortensen.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 3, 
      "publishers": [
        "Vega"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-10-15T15:26:45.512262"
      }, 
      "key": "/books/OL16541408M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "[K\u00f8obenhavn]"
      ], 
      "pagination": "329 p.", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-09-24T15:45:30.569311"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer."
      }, 
      "number_of_pages": 329, 
      "isbn_10": [
        "8758804110"
      ], 
      "publish_date": "1989", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "1163292"
        ]
      }, 
      "lc_classifications": [
        "PS3513.I2824"
      ], 
      "latest_revision": 4, 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:2715222992:644"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "publish_country": "enk", 
      "by_statement": "by William Gibson.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "Gollancz"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-18T12:15:40.027146"
      }, 
      "key": "/books/OL19160947M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "London"
      ], 
      "pagination": "251 p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-10-21T06:38:01.937259"
      }, 
      "dewey_decimal_class": [
        "823/.914"
      ], 
      "number_of_pages": 251, 
      "isbn_10": [
        "057503470X"
      ], 
      "publish_date": "1984", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "number_of_pages": 273, 
      "latest_revision": 6, 
      "contributions": [
        "Cuijpers, Peter"
      ], 
      "edition_name": "1. druk.", 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:848175223:692"
      ], 
      "title": "Zenumagi\u00ebr", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/dut"
        }
      ], 
      "publish_country": "ne ", 
      "by_statement": "William Gibson ; vertaling Peter Cuijpers.", 
      "oclc_numbers": [
        "64599048"
      ], 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 6, 
      "publishers": [
        "Meulenhoff"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2011-04-28T07:26:35.438655"
      }, 
      "key": "/books/OL16541422M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "Amsterdam"
      ], 
      "pagination": "273 p.", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-09-24T15:45:41.892954"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer."
      }, 
      "identifiers": {
        "librarything": [
          "609"
        ]
      }, 
      "isbn_10": [
        "9029042478"
      ], 
      "publish_date": "1989", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "number_of_pages": 295, 
      "latest_revision": 6, 
      "contributions": [
        "Eggen, Torgrim, 1958-"
      ], 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:4064625723:790"
      ], 
      "title": "Nevromantiker", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/nor"
        }
      ], 
      "publish_country": "no ", 
      "by_statement": "William Gibson ; oversatt av og med etterord av Torgrim Eggen.", 
      "oclc_numbers": [
        "224937105"
      ], 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 6, 
      "publishers": [
        "Aschehoug"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2011-04-25T21:45:39.581918"
      }, 
      "key": "/books/OL19726291M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "Oslo"
      ], 
      "pagination": "295 p.", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-10-23T17:52:44.936450"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer."
      }, 
      "identifiers": {
        "librarything": [
          "609"
        ]
      }, 
      "isbn_10": [
        "8203203329"
      ], 
      "publish_date": "1999", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "publishers": [
        "Editrice Nord"
      ], 
      "pagination": "iii, 260 p.", 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:1419376120:645"
      ], 
      "title": "Neuromante", 
      "work_titles": [
        "Neuromancer."
      ], 
      "series": [
        "Cosmo -- 80"
      ], 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer."
      }, 
      "number_of_pages": 260, 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-09-28T17:38:21.398006"
      }, 
      "languages": [
        {
          "key": "/languages/ita"
        }
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-10-15T15:26:45.512262"
      }, 
      "latest_revision": 3, 
      "publish_country": "it ", 
      "key": "/books/OL17407456M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_date": "1986", 
      "publish_places": [
        "Milano"
      ], 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ], 
      "type": {
        "key": "/type/edition"
      }, 
      "by_statement": "William Gibson.", 
      "revision": 3
    }, 
    {
      "number_of_pages": 271, 
      "subject_place": [
        "Japan"
      ], 
      "covers": [
        284574
      ], 
      "lc_classifications": [
        "PS3557.I2264 N48 1984"
      ], 
      "latest_revision": 11, 
      "ocaid": "neuromancer00gibs", 
      "genres": [
        "Fiction."
      ], 
      "source_records": [
        "marc:marc_records_scriblio_net/part22.dat:84028207:784", 
        "marc:CollingswoodLibraryMarcDump10-27-2008/Collingswood.out:7879172:1418", 
        "marc:marc_cca/b10621386.out:20298617:552", 
        "ia:neuromancer00gibs"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "subjects": [
        "Computer hackers -- Fiction", 
        "Business intelligence -- Fiction", 
        "Information superhighway -- Fiction", 
        "Nervous system -- Wounds and injuries -- Fiction", 
        "Conspiracies -- Fiction", 
        "Japan -- Fiction"
      ], 
      "publish_country": "nyu", 
      "by_statement": "William Gibson.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 11, 
      "publishers": [
        "Ace Books"
      ], 
      "ia_box_id": [
        "IA111402"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2011-08-12T04:31:24.064755"
      }, 
      "key": "/books/OL1627167M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "New York"
      ], 
      "pagination": "271 p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-04-01T03:28:50.625462"
      }, 
      "dewey_decimal_class": [
        "813/.54"
      ], 
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "22328"
        ]
      }, 
      "lccn": [
        "91174394"
      ], 
      "isbn_10": [
        "0441569595"
      ], 
      "publish_date": "1984", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "313982"
        ]
      }, 
      "subject_place": [
        "Japan"
      ], 
      "covers": [
        283491
      ], 
      "lc_classifications": [
        "PS3557.I2264 N48 1994"
      ], 
      "latest_revision": 5, 
      "edition_name": "1st Ace hardcover ed.", 
      "genres": [
        "Fiction."
      ], 
      "source_records": [
        "marc:marc_records_scriblio_net/part24.dat:178109658:845"
      ], 
      "title": "Neuromancer", 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "subjects": [
        "Computer hackers -- Fiction", 
        "Business intelligence -- Fiction", 
        "Information superhighway -- Fiction", 
        "Nervous system -- Wounds and injuries -- Fiction", 
        "Conspiracies -- Fiction", 
        "Japan -- Fiction"
      ], 
      "publish_country": "nyu", 
      "by_statement": "William Gibson.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 5, 
      "publishers": [
        "Ace Books"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-07-31T01:58:09.386680"
      }, 
      "key": "/books/OL1234381M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "New York"
      ], 
      "pagination": "278 p. ;", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-04-01T03:28:50.625462"
      }, 
      "dewey_decimal_class": [
        "813/.54"
      ], 
      "number_of_pages": 278, 
      "lccn": [
        "94237181"
      ], 
      "isbn_10": [
        "0441000681"
      ], 
      "publish_date": "1994", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "122395"
        ]
      }, 
      "latest_revision": 4, 
      "source_records": [
        "marc:talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:718603618:565"
      ], 
      "title": "Neuromancien", 
      "work_titles": [
        "Neuromancer."
      ], 
      "languages": [
        {
          "key": "/languages/fre"
        }
      ], 
      "publish_country": "fr ", 
      "by_statement": "traduit de l'ame\u0301ricain par Jean Bonnefoy.", 
      "type": {
        "key": "/type/edition"
      }, 
      "revision": 4, 
      "publishers": [
        "J'ai Lu"
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-18T23:31:09.118145"
      }, 
      "key": "/books/OL21795410M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ], 
      "publish_places": [
        "[Paris]"
      ], 
      "pagination": "319p.", 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-11-04T00:30:01.234536"
      }, 
      "notes": {
        "type": "/type/text", 
        "value": "Translation of: Neuromancer."
      }, 
      "number_of_pages": 319, 
      "isbn_10": [
        "2277223255"
      ], 
      "publish_date": "1985", 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ]
    }, 
    {
      "publishers": [
        "Voyager"
      ], 
      "pagination": "317 p.", 
      "identifiers": {
        "librarything": [
          "609"
        ], 
        "goodreads": [
          "953070"
        ]
      }, 
      "revision": 4, 
      "source_records": [
        "marc:marc_university_of_toronto/uoft.marc:3986224271:616"
      ], 
      "title": "Neuromancer", 
      "isbn_10": [
        "0586066454"
      ], 
      "number_of_pages": 317, 
      "created": {
        "type": "/type/datetime", 
        "value": "2008-10-30T08:07:12.492696"
      }, 
      "languages": [
        {
          "key": "/languages/eng"
        }
      ], 
      "last_modified": {
        "type": "/type/datetime", 
        "value": "2010-08-18T17:29:49.077199"
      }, 
      "latest_revision": 4, 
      "edition_name": "Pbk. ed.", 
      "key": "/books/OL20872554M", 
      "authors": [
        {
          "key": "/authors/OL26283A"
        }
      ],
      "publish_date": "2000", 
      "publish_places": [
        "London"
      ], 
      "works": [
        {
          "key": "/works/OL27258W"
        }
      ], 
      "type": {
        "key": "/type/edition"
      }, 
      "by_statement": "William Gibson.", 
      "publish_country": "enk"
    }
  ]
}

Because of some work I’ve been doing helping out at Gluejar I became curious about the coverage of these three FRBR workset APIs. What sort of overlap is there between them? I wrote a little script worksvenn.py that takes one or more ISBNs as input, looks them up in the OpenLibrary, LibraryThing and OCLC APIs, and then outputs the resulting data with a Venn diagram using the Google Chart API.

It’s interesting to see that each service has unique results. You can see these when you run worksvenn.py on the command line:

Workset Results:

oclc: 7542818732,2707115622,7542824139,9029042478,8085601273,0441012035,
0441569579,229000619X,3893111387,2744139157,9607002504,8071930482,
9637632050,8585887907,8485752414,8758804110,8445076620,9118721826,
8203203329,0441569587,8804516445,8422672596,8789586735,0932096417,
3893111379,1570420599,8445072897,5792101205,9755760326,569700124X,
9510172049,0441007465,0736638369,9510193062,8390021439,911300347X,
8445075950,0002252325,0441569595,0441000681,5170198892,3807703098,
0007119585,415010672X,807193318X,3453056655,8974271419,8842910686,
9029050748,3898132595,3453313895,057503470X,1616577843,0307969940,
8385784012,2277223255,0006480411,9029049367,0586066454,1570421560,
8371500432,229030820X,8842907464,0441569560,9119027818,8445070843,
8467426373,9612310203,8790136292,8982738851,3453403851,8445074059

librarything: 0441569595,2707115622,0441000681,9634970982,9118721826,
9029042478,8085601273,3453056655,0006480411,8842906808,0441569579,
229000619X,415010672X,3893111387,0441012035,9639238023,3453074203,
9510193062,9637632050,8585887907,8842910686,0441007465,3898132595,
8203203329,1569564116,8371500432,3453313895,0736638369,057503470X,
8789586735,0932096417,9752103677,8445075950,8778803438,2277223255,
8576570491,8804516445,0613922514,0586066454,1570421560,3893111379,
229030820X,807193318X,8071930482,8842913529,0441569560,9119027818,
8445070843,0007119585,9510172049,2744139157,8324577750,8790136292,
0307969940,0441569587,8842907464,1570420599,8445072897,8445074059,
0002252325

openlibrary: 8758804110,0441569595,8203203329,3453313895,057503470X,
0932096417,9029042478,2277223255,0441000681,0006480411,0441012035,
0586066454,0002252325,8445072897,0441007465,8790136292

Differences:

oclc \ librarything: 7542818732,7542824139,5170198892,569700124X,
8974271419,9607002504,8485752414,9029050748,8758804110,8445076620,
8422672596,9612310203,1616577843,8385784012,9029049367,3453403851,
5792101205,3807703098,9755760326,8467426373,8982738851,8390021439,
911300347X

oclc \ openlibrary: 7542818732,2707115622,9118721826,5170198892,
0007119585,8085601273,8445070843,3453056655,0441569579,229000619X,
415010672X,3893111387,2744139157,8467426373,8974271419,9607002504,
8071930482,9637632050,8585887907,8485752414,8371500432,9029050748,
3898132595,8445076620,7542824139,0441569587,8982738851,8804516445,
8422672596,8789586735,9612310203,1616577843,0307969940,8385784012,
8842907464,9029049367,8842910686,1570421560,3893111379,229030820X,
807193318X,911300347X,0441569560,5792101205,9119027818,3807703098,
9755760326,569700124X,9510172049,8445074059,0736638369,9510193062,
8390021439,1570420599,8445075950,3453403851

librarything \ oclc:  8842906808,9634970982,8842913529,9639238023,
9752103677,1569564116,8778803438,8576570491,8324577750,0613922514,
3453074203

librarything \ openlibrary:  2707115622,9634970982,9118721826,
8085601273,3453056655,8842906808,0441569579,229000619X,415010672X,
3893111387,2744139157,9639238023,9510193062,9637632050,807193318X,
8585887907,8842910686,3898132595,8324577750,3893111379,8804516445,
1570420599,8789586735,9752103677,8778803438,8576570491,0613922514,
1570421560,8371500432,229030820X,3453074203,8071930482,8842913529,
0441569560,9119027818,8445070843,0007119585,9510172049,1569564116,
0736638369,0307969940,0441569587,8842907464,8445075950,8445074059

openlibrary \ oclc:  

openlibrary \ librarything:  8758804110

This suggests that the workset data in these services actually reinforce each other, and a lot could be gained by sharing. For comparison here are the diagrams for a few more books:


As I mentioned earlier, you can pass worksvenn.py a list of ISBNs and it will pool them all together. At Gluejar we have a list of 53 books that are examples of potential books for ungluing. so I ran these through and came up with this diagram.

Although looking on a piecemeal basis can be interesting, it would be fun to see a Venn diagram given a larger pool of seed ISBNs. Perhaps worksvenn.py will give you some ideas. If it does please let me know!


an ode to node

When I made my first edit to Wikipedia a few years ago I can remember watching the recent changes page to see my contribution pop up. I was shocked to see just how quickly my edit was swept up in the torrent of edits that are going on all the time. I think everyone who googles for topical information is familiar with the experience of having Wikipedia articles routinely appear near the top of their search results. In hindsight it should’ve been obvious, but the level of participation in the curation of content at Wikipedia struck me as significant…and somehow different. It was wonderful to see living evidence of so many people caring to collaboratively document our world.

The Obsession

I work as a software developer in the cultural heritage sector, and often find myself building editing environments for users to collaboratively create and edit content. These systems typically get used here and there; but they in no way compare to the sheer volume of edit activity that Wikipedia sees from around the world, every single day. I guess I’d read about crowdsourcing, but had never been provided with a window into it like this before. My wife encourages her 5th grade students to think critically about Wikipedia as an information source. One way she has done this in the past was by having them author an article for their school, which didn’t have an article previously. I wanted to help her and her students see how they were part of a large community of Wikipedia editors; and to give them a tactile sense of the amount of people who are actively engaged in making Wikipedia better.

A few months later Georgi Kobilarov let me know about the many IRC channels where various bits of metadata about recent changes in Wikipedia are announced. Georgi told me about a bot that the BBC run to track changes to Wikipedia, so that relevant article content can be pulled back to the BBC. I guess a light bulb turned on. Could I use these channels to show people how much Wikipedia is actively curated, without requiring them to reload the recent changes page, connect to some cryptic IRC channels, or dig around in some (wonderfully) detailed statistics. More importantly, could it be done in a playful way?

The Apps

Some more time passed and I came across some new tools (more about these below) that made it easy to throw together a Web visualization of the Wikipedia update stream. The tools proved to be so much fun that I ended up making two.

wikistream displays the edits to 38 language wikipedias as a river of flowing text. The content moves by so quickly that I had to add a pause button (the letter p) in order to test things like clicking on the update to see the change that was made. The little icons to the left indicate whether the edit was made by a registered Wikipedia user, an anonymous user, or a bot (there are lots of them). After getting some good feedback on the wikitech-l discussion list I added some knobs to limit updates to specific languages and types of user, or size of the edit. I also added a periodically updating background image based on uploads to the Wikimedia Commons.

The second visualization app is called wikipulse. Dario Taraborelli of the Wikimedia Foundation emailed me with the idea to use the same update data stream I used in wikistream to fuel a higher level view of the edit activity using the gauge widget in Google’s Chart API. To the left is one of these gauges which displays the edits per minute on 36 wikipedia properties. If you visit wikipulse you will also see individual gauges for each language wikipedia. It’s a bit overkill seeing all the gauges on the screen, but it’s also kind of fun to see them update automatically every second relative to each other, based on the live edit activity.

The Tools


For both of these apps I needed to log into the wikimedia IRC server, listen on ~30 different channels, push all the updates through some code that helped visualize the data in some way, and then get this data out to the browser. I had heard good things about node for high concurrency network programming from several people. I ran across a node library called socket.io that reported to make it easy to stream updates from the server to the client, in a browser independent way, using a variety of transport protocols. Instinctively it felt like the pub/sub model would also be handy for connecting up the IRC updates with the webapp. I had been wanting to play around with the pub/sub features in redis for some time, and since there is a nice redis library for node I decided to give it a try.

Like many web developers I am used to writing JavaScript for the browser. Tools like jQuery and underscore.js successfully raised the bar to the point that I’m able to write JavaScript and still look myself in the mirror in the morning. But I was still a bit skeptical about JavaScript running on the server side. The thing I didn’t count on was how well node’s event driven model, the library support (socket.io, redis, express), and the functional programming style fit the domain of making the Wikipedia update stream available on the Web.

For example here’s is the code to connect to the ~30 IRC chatrooms stored in the channels variable, and send all the messages to a function processMessage:

var client = new irc({server: 'irc.wikimedia.org', nick: config.ircNick});

client.connect(function () {
  client.join(channels);
  client.on('privmsg', processMessage);
});

The processMessage function then parses the IRC message into a JavaScript dictionary and publishes it to a ‘wikipedia’ channel in redis:

function processMessage (msg) {
  m = parse_msg(msg.params);
  redis.publish('wikipedia', m);
}

Then over in my wikistream web application I set up socket.io so that when a browser goes to my webapp it negotiates for the best way to get updates from the server. Once a connection is established the server subscribes to the wikipedia channel and sends any updates it receives out to the browser. When the browser disconnects, the connection to redis is closed.

var io = sio.listen(app);

io.sockets.on('connection', function(socket) {
  var updates = redis.createClient();
  updates.subscribe('wikipedia');
  updates.on("message", function (channel, message) {
    socket.send(message);
  });
  socket.on('disconnect', function() {
    updates.quit();
  });
});

Each update is represented as a JavaScript dictionary, which socket.io and node’s redis client transparently serialize and deserialize. In order to understand the socket.io protocol a bit more, I wrote a little python script that connects to wikistream.inkdroid.org, negotiates for the xhr-polling transport, and prints out the stream JSON to the console. It’s a demonstration of how a socket.io instance like wikistream can be used as an API for creating a firehose like service. Although I guess the example might’ve been a bit cleaner to negotiate for a websocket instead.

{
  'anonymous': False,
  'comment': '/* Anatomy */  changed statement that orbit was the eye to saying that the orbit was the eye socket for accuracy',
  'delta': 7,
  'flag': '',
  'namespace': 'article',
  'newPage': False,
  'page': 'Optic nerve',
  'pageUrl': 'http://en.wikipedia.org/wiki/Optic_nerve',
  'robot': False,
  'unpatrolled': False,
  'url': 'http://en.wikipedia.org/w/index.php?diff=449570600&oldid=447889877',
  'user': 'Moearly',
  'userUrl': 'http://en.wikipedia.org/wiki/User:Moearly',
  'wikipedia': '#en.wikipedia',
  'wikipediaLong': 'English Wikipedia',
  'wikipediaShort': 'en',
  'wikipediaUrl': 'http://en.wikipedia.org'
}

This felt so easy, it really made me re-evaluate everything I thought I knew about JavaScript. Plus it all became worth it when Ward Cunningham (the creator of the Wiki idea) wrote on the wiki-research list:

I’ve written this app several times using technology from text-to-speech to quartz-composer. I have to tip my hat to Ed for doing a better job than I ever did and doing it in a way that he makes look effortless. Kudos to Ed for sharing both the page and the software that produces it. You made my morning.

Ward is a personal hero of mine, so making his morning pretty much made my professional career.

I guess this is all a long way of saying what many of you probably already know…the tooling around JavaScript (and especially node) has changed so much, that it really does offer a radically new programming environment, that is worth checking out, especially for network programming. The event driven model that is baked into node, and the fact that v8 runs blisteringly fast, make it possible to write apps that do a whole lot in one low memory process. This is handy when deploying an app to an EC2 mini instance or Heroku, which is where wikipulse is running…for free.

Of course it helped that my wife and kids got a kick out of wikistream and wikipulse. I suspect that they think I’m a bit obsessed with Wikipedia, but that’s ok … because I kinda am.


day of digital archives psa

Today is Day of Digital Archives day and I had this semi-thoughtful post written up about BagIt and how it’s a brain dead simple format to use to package up your files so that you’ll know if you still have them 5 minutes, 5 hours, 5 days, 5 years, maybe even 5 decades from now–if the notion of directories and files persists that long.

But I deleted that…you’re welcome…

I was also going to write about how in a fit of web performance art Mark Pilgrim recently deleted his online presence, including various extremely useful opensource tools, and several popular online books, only to see them re-materialize on the Web at new locations.

But I deleted most of that too…you’re welcome again!

Here’s a public service announcement instead. If you happen to use Franco Lazzarino’s Ruby BagIt Library to create bags that contains largish files (> 500MB), you might have accidentally created bad SHA1 manifests. I added a test, and fixed the bug with help from Mark Matienzo and Michael Klein, and sent a pull request. It hasn’t been applied yet, so here’s to hoping it will.

At $mpow we’ve been getting terabytes of data from this social media company that has been bagging their data using this Ruby library. Many of the files are multi-gigabytes gzip compressed. And many of the bags now have bad SHA1 manifests. The social media company wasn’t sure what the problem was, and told us just to ignore the SHA1 manifests. Which is easy enough to do.

It seems like no matter how simple the spec, it’s easy to create bugs. If you create bags, throw Bag-Software-Agent into your bag-info.txt…you never know who might find it useful.


stepping backwards

Jonathan Rochkind recently wrote a good blog post about using HTML5 Microdata to help citation managers like Mendeley and Zotero discover citation metadata that is available in formats such as RIS. It’s an excellent and detailed complement to Eric Hellman’s piece on the same subject.

I contributed to the unAPI effort 5 years ago, which aimed to fix the same problem: making citation metadata available to browsers. I wrote the unAPI validator which helped implementors confirm they were doing things right, articles were written, and we saw implementations in software such as the opensource integrated library system Evergreen and the popular citation manager Zotero, which at one point looked first for unAPI metadata in pages…perhaps it still does.

As Jonathan points out, there are some issues with unAPI, such as accessibility problems around Microformats in general, which unAPI was partly modeled on. HTML5 Microdata and RDFa weren’t around when we were working on unAPI, so I think Jonathan is right that it definitely makes sense to think about using these technologies nowadays instead of unAPI when making structured metadata available in HTML. I personally think the same thing goes for COinS where OpenURL key value pairs are used to express the metadata. Companies like Google, Microsoft, Yahoo and Facebook actively scrape HTML5 Microdata and RDFa, and there are vocabularies for describing books and articles. And because these technologies are deployed wider than the small niche that libraries occupy, they fit the Web better.

But there is a fair bit of turmoil in the structured-data-on-the-Web space. Today’s F8 product announcements seemed to indicate that Facebook is deepening its use of the OpenGraphProtocol, which is their rebranding of RDFa. We’re seeing the International Press Telecommunications Council standardizing rNews as an RDFa vocabulary for expressing online news metadata. And meanwhile Google, Microsoft and Yahoo are continuing to work on schema.org Microdata vocabularies. The recent Schema.org Workshop seems to anticipate significant changes in that space in the near future, particularly regarding the output of the W3C Web Schema and HTML Data task forces.

At LODLAM-DC we had a good conversation about RDFa, Microdata, Microformats and JSON publishing options for the cultural heritage sector. Perhaps I was just projecting, but it seemed like there was a fair bit of uncertainty about which to use. At the end of the day it seems like making your decisions based on things you want to enable is a good way forward. Are you trying to get your content to show up nicely on Facebook or Google–or both?

…or are you trying to do something else, like advertise some RIS citation metadata that is related to an HTML page so a citation manager can pick it up?

Even before the pixels had dried on the first version of the unAPI spec I was left with the nagging feeling that it had missed the point. I felt like we hadn’t really used the mechanics of the Web that were already there, and had sort of inadvertently succumbed to how standards development would be lampooned later by XKCD:

Specifically, I felt like we could have documented an even simpler pattern, namely using a <link> or <a> elements in conjunction with the rel and type parameters. So if you have a search result that is available as RIS, why not add this to your <head> element:


My IRC conversation with Jonathan about his blog post was rolling around in my head when this Kurt Vonnegut quote went by in my Twitter stream:

It seemed oddly appropriate given the uncertainty in the structured-data-on-the-web marketplace, and some missteps with unAPI. If all we want to do is replace unAPI with something easier and more web-friendly, then why not fall back on basic functionality that has been in HTML for years?

If you want to make structured metadata available directly in HTML, sure HTML5 Microdata and RDFa are important technologies to use. But if all you want to do is link to an external metadata file I personally think the scholarly community would be better served by a simpler and less controversial approach.