linking spoken quotes of quotes

An ancient buddha said, “If you do not wish to incur the cause for Unceasing Hell, do not slander the true dharma wheel of the Tathagata. You should carve these words on your skin, flesh, bones and marrow; on your body, mind and environment; on emptiness and on form. They are already carved on trees and rocks, on fields and villages.”

From Gary Snyder’s reading of The Teachings of Zen Master Dogen (about 1:26:00 in).

His delivery is just a delight to listen to. The puzzling strangeness of the text are made whole in the precision, earthiness and humor of his words.

Flickr Commons LAMs

After the last post Seb got me wondering if there were any differences between libraries, archives and museums when looking at upload and comment activity in Flickr Commons in Aaron’s snapshot of the Flickr Commons metadata.

First I had to get a list of Flickr Commons organizations and classify them as either a library, museum or archive. It wasn’t always easy to pick, but you can see the result here. I lumped galleries in with museums. I also lumped historical societies in with archives. Then I wrote a script that walked around in the Redis database I already had from loading Aaron’s data.

In doing this I noticed there were some Flickr Commons organizations that were missing from Aaron’s snapshot:

  • Tasmanian Archive and Heritage Office Commons
  • The Royal Library, Denmark
  • The Finnish Museum of Photography
  • Musée McCord Museum

Update: Aaron quickly fixed this.

I didn’t do any research to see if these organizations had significant activity. Also, since there were close to a million files, I didn’t load the British Library activity yet. If there’s interest in adding them into the mix I’ll splurge for the larger ec2 instance.

Anyhow, below are the results. You can find the spreadsheet for these graphs up in Google Docs

This was all done rather quickly, so if you notice anything odd or that looks amiss please let me know. Initially it seemed a bit strange to me that libraries, archives and museums trended so similarly in each graph, even if the volume was different.

OCLC Works

The news about OCLC’s Linked Data service circulated widely on Twitter yesterday. I’ve never been a big OCLC cheerleader, but the news really hit home for me. I’ve been writing in my rambling way about Linked Data here for about 6 years. Of course there are many others who’ve been at it much longer than I have … and in a way I think librarians and archivists feel a kinship with the effort because it is cooked into the DNA of how we think about the Web as an information space.

Like Button

This new OCLC service struck me as an excellent development for the library Web community for a few reasons, that I thought I would quickly jot down:

  • it’s evolutionary: OCLC didn’t let the perfect be the enemy of the good. It’s great to hear links to VIAF, FAST, LCSH, etc are planned. But you have to start somewhere, and there is already significant value in expressing the FRBR workset data they have as Linked Data on the Web for others to use. Also, the domain experiment.worldcat.org clearly reflects this is an experiment…but they didn’t let anxiety about changing URLs prevent them from publishing what they can now. The future is longer than the past.
  • it’s snappy: I don’t know if they’ve written about the technical architecture they are using, but the views are quite responsive. Of course I have no idea what kind of load it is under, but so far so good. Update: Ron Buckley of OCLC let me know the service is built on top of a shared Apache HBase Hadoop cluster.
  • schema.org: OCLC has the brains and the market position to create their own vocabulary for bibliographic data. But they worked hard at engaging openly with the Web community to help clarify and adapt the Schema.org vocabulary so that it can be used by our community. There is lots of thrashing going on in this space at the moment, and OCLC is being a great model in trying to work with the Web we have, and iterating to make it better, instead of trying to take a quantum leap forward.
  • json-ld: JSON-LD has been cooking for a while, but it’s a brand new W3C standard for representing RDF as idiomatic JSON. RDF has been somewhat plagued in the past by esoteric and/or hard to understand representations. JSON-LD really seems to have hit a sweet-spot between the expressivity of RDF and the usability of the Web. It’s refreshing to see OCLC kicking JSON-LD’s tires.

Rubber Meet Road

So how do you discover these Work URIs? Richard’s post led me to believe I could get them directly from the xID service using an ISBN. But I found it to be a two step process: first get any OCLC Number associated with an ISBN from xID, and then use the OCLC Number to get the Work Identifier from the xID service:

So for example, to discover the Work URI for Tim Berners-Lee’s Weaving the Web you first look up the ISBN:

http://xisbn.worldcat.org/webservices/xid/isbn/0062515861?method=getMetadata&format=json&fl=*

which should yield:

{
    "list": [
        {
            "author": "Tim Berners-Lee with Mark Fischetti.",
            "city": "San Francisco",
            "ed": "1st ed.",
            "form": [
                "AA",
                "BA"
            ],
            "isbn": [
                "0062515861"
            ],
            "lang": "eng",
            "lccn": [
                "99027665",
                "00039593"
            ],
            "oclcnum": [
                "300691968",
                "318261941",
                "410824754",
                "41238513",
                "470718156",
                "558595430",
                "628749869",
                "768228949",
                "807901805",
                "43903751",
                "699807622"
            ],
            "publisher": "HarperSanFrancisco.",
            "title": "Weaving the Web : the original design and ultimate destiny of the World Wide Web by its inventor",
            "url": [
                "http://www.worldcat.org/oclc/300691968?referer=xid"
            ],
            "year": "1999"
        }
    ],
    "stat": "ok"
}

Then pick one of the OCLC Numbers (oclcnum) at random and use it to do an xID call:

http://xisbn.worldcat.org/webservices/xid/oclcnum/300691968?method=getMetadata&format=json&fl=*

Which should return:

{
    "list": [
        {
            "isbn": [
                "9780062515865",
                "9780062515872"
            ],
            "lccn": [
                "99027665"
            ],
            "oclcnum": [
                "300691968"
            ],
            "owi": [
                "owi27331745"
            ]
        }
    ],
    "stat": "ok"
}

You can then dig out the Work Identifier (owi), trim off the owi prefix, and put it on the end of a URL like:

http://experiment.worldcat.org/entity/work/data/27331745

or, if you want the JSON-LD without doing content negotiation:

http://experiment.worldcat.org/entity/work/data/27331745.jsonld

This returns a chunk of JSON data that I won’t reproduce here, but do check it out.

Update: After hitting publish on this blog post I’ve corresponded a bit with Stephan Schindehette at OCLC and Alf Eaton about some inconsistencies in my blog post (which I’ve fixed), and uncertainty about what the xID API should be returning. Hopefully xID can be updated to return the OCLC Work Identifier when you lookup by ISBN. I’ll update this blog post if I am notified of a change.

Peanut Gallery

One bit of advice that I was given by Dave Longley on the #json-ld IRC channel, which I will pass along to OCLC, is that it might be better to use CURIE-less properties, e.g. name instead of schema:name, to make it easier to use (and read) the JSON from JavaScript. To do this you would need a more expressive @context but I think it might make sense to reference an external context document and cut down on the size of the JSON-LD document even more.

It’s wonderful to see that the data is being licensed ODC-BY, but maybe assertions to that effect should be there in the data as well? I think schema.org have steered clear of licensing properties, but cc:license seems like a reasonable property to use, assuming it’s used with the right subject URI.

And one last tiny suggestion I have is that it would be nice to see the service mainstreamed into other parts of OCLC’s website. But I understand all too well the divides between R&D and production … and how challenging it can be to integrate them sometimes, even in the simplest of ways.