The news about OCLC’s Linked Data service circulated widely on Twitter yesterday. I’ve never been a big OCLC cheerleader, but the news really hit home for me. I’ve been writing in my rambling way about Linked Data here for about 6 years. Of course there are many others who’ve been at it much longer than I have … and in a way I think librarians and archivists feel a kinship with the effort because it is cooked into the DNA of how we think about the Web as an information space.
This new OCLC service struck me as an excellent development for the library Web community for a few reasons, that I thought I would quickly jot down:
- it’s evolutionary: OCLC didn’t let the perfect be the enemy of the good. It’s great to hear links to VIAF, FAST, LCSH, etc are planned. But you have to start somewhere, and there is already significant value in expressing the FRBR workset data they have as Linked Data on the Web for others to use. Also, the domain
experiment.worldcat.org clearly reflects this is an experiment…but they didn’t let anxiety about changing URLs prevent them from publishing what they can now. The future is longer than the past.
- it’s snappy: I don’t know if they’ve written about the technical architecture they are using, but the views are quite responsive. Of course I have no idea what kind of load it is under, but so far so good. Update: Ron Buckley of OCLC let me know the service is built on top of a shared Apache HBase Hadoop cluster.
- schema.org: OCLC has the brains and the market position to create their own vocabulary for bibliographic data. But they worked hard at engaging openly with the Web community to help clarify and adapt the Schema.org vocabulary so that it can be used by our community. There is lots of thrashing going on in this space at the moment, and OCLC is being a great model in trying to work with the Web we have, and iterating to make it better, instead of trying to take a quantum leap forward.
- json-ld: JSON-LD has been cooking for a while, but it’s a brand new W3C standard for representing RDF as idiomatic JSON. RDF has been somewhat plagued in the past by esoteric and/or hard to understand representations. JSON-LD really seems to have hit a sweet-spot between the expressivity of RDF and the usability of the Web. It’s refreshing to see OCLC kicking JSON-LD’s tires.
Rubber Meet Road
So how do you discover these Work URIs? Richard’s post led me to believe I could get them directly from the xID service using an ISBN. But I found it to be a two step process: first get any OCLC Number associated with an ISBN from xID, and then use the OCLC Number to get the Work Identifier from the xID service:
So for example, to discover the Work URI for Tim Berners-Lee’s Weaving the Web you first look up the ISBN:
which should yield:
"author": "Tim Berners-Lee with Mark Fischetti.",
"city": "San Francisco",
"ed": "1st ed.",
"title": "Weaving the Web : the original design and ultimate destiny of the World Wide Web by its inventor",
Then pick one of the OCLC Numbers (oclcnum) at random and use it to do an xID call:
Which should return:
You can then dig out the Work Identifier (owi), trim off the owi prefix, and put it on the end of a URL like:
or, if you want the JSON-LD without doing content negotiation:
This returns a chunk of JSON data that I won’t reproduce here, but do check it out.
Update: After hitting publish on this blog post I’ve corresponded a bit with Stephan Schindehette at OCLC and Alf Eaton about some inconsistencies in my blog post (which I’ve fixed), and uncertainty about what the xID API should be returning. Hopefully xID can be updated to return the OCLC Work Identifier when you lookup by ISBN. I’ll update this blog post if I am notified of a change.
One bit of advice that I was given by Dave Longley on the #json-ld IRC channel, which I will pass along to OCLC, is that it might be better to use CURIE-less properties, e.g.
name instead of
@context but I think it might make sense to reference an external context document and cut down on the size of the JSON-LD document even more.
It’s wonderful to see that the data is being licensed ODC-BY, but maybe assertions to that effect should be there in the data as well? I think schema.org have steered clear of licensing properties, but cc:license seems like a reasonable property to use, assuming it’s used with the right subject URI.
And one last tiny suggestion I have is that it would be nice to see the service mainstreamed into other parts of OCLC’s website. But I understand all too well the divides between R&D and production … and how challenging it can be to integrate them sometimes, even in the simplest of ways.