work identifiers and the web

Michael Smethurst’s In Search of Cultural Identifiers post over at the BBC Radio Labs got me thinking about web identifiers for works, about LibraryThing and OCLC as linked library data providers, and finally about the International Standard Text Code. Admittedly it’s kind of a hodge-podge of topics, and I’m going to taking some liberties with what ‘linked data’ and ‘works’ mean, so bear with me.

Both OCLC Worldcat and LibraryThing mint URIs for bibliographic works, like these for Wide Sargasso Sea:

So the library community really does have web identifiers for works–or more precisely web identifiers for human readable records about works. What’s missing (IMHO) is the ability to use that identifier to get back something meaningful for a machine. Tools like Zotero need to scrape the screen to pull out the data points of interest to citation management. Sure, if you want you can implement COinS or unAPI to allow the metadata to be extracted, but could there be a more web-friendly way of doing this?

Consider how blog syndication works on the web. You visit a blog (like this one) and your browser is able to magically figure out the location of an RSS or Atom feed for the blog, and give you an option to subscribe to it.

Well it’s not really magic it’s just a bit of markup in the HTML:

Simple right?

Now back to work identifiers. Consider that both Worldcat and LibraryThing have web2.0 apis for retrieving machine readable data for a work:

http://www.librarything.com/services/rest/1.0/?method=librarything.ck.getwork&id={work_id}&apikey={your_key}

or:

http://www.worldcat.org/webservices/catalog/content/{oclc_number}?wskey={key}

What if the web pages for these resources at OCLC and LibraryThing linked directly to these machine readable versions? For example if the page for Wide Sargasso Sea at LibraryThing contained this in its <head> element:

This would allow browsers, plugin tools like Zotero and web crawlers to follow the natural grain of the web and discover the machine readable representation. Admittedly this is something that COinS and unapi are designed to do. But the COinS and unAPI protocols are really optimized for making citation data, and non web identifiers available and routable via a resolver of some kind. Maybe I’m just over reaching a bit, but this approach of using the <link> header seems to embrace the notion that there are resources within the Worldcat and Librarything websites, and there can be alternate representations of those resources that can be discovered in a hypertext-driven way.

Of course there is the issue of the API key. In the example above I used the demo key in LibraryThing’s docs. More important in the context of web identifiers for works is the need to distinguish between the identifier for the record, and the identifier for the concept of the work, which is most elegantly solved (IMHO) by following a pattern from the Cool URIs for the Semantic Web doc. But I think it’s important that people realize that it’s not necessary to jump headlong into RDF to start leveraging some of the principles behind the Architecture of the World Wide Web. Henry Thompson has a nice web-centric discussion of this issue in his What’s a URI and Why Does it Matter?

While writing this blog post I noticed a thread over on Autocat that Bowker has been named the US Registrar for the International Standard Text Code. The gist is that the ISTC will be a “global identification system for textual works”, and that registrars (like Bowker) will mint identifiers for works, such as:

ISTC 0A9 2002 12B4A105 7

Where the structure of the identifier is roughly:

ISTC {registration agency} {year element} {work element} {check digit}

It’s interesting that the meat of the ISTC is the work element that is:

… assigned automatically by the central ISTC registration system after a metadata record has been submitted for registration and the system has verified that the record is unique;

The metadata record in question is actually a chunk of ONIX, which presumably Bowker will send to the ISTC central registrar, and get back a work id.

This work that the ISTC is taking on is really important–and one would imagine quite costly. One thing I would suggest to them is that they may want to make the ISTC codes have a URI equivalent like:

http://istc-international/0A9/2002/12B4A1057

They also should encourage Bowker and other registrars to publish their work identifiers on the web:

http://bowker.com/istc/0A9/2002/12B4A1057

It seems to me that we might (in the long term) be better served by a system that embraces the distributed nature of the web. A web in which organizations like Bowker, ISTC, OCLC, LibraryThing, Library of Congress and national libraries publish their work identifiers using URIs, and return meaningful metadata for them. Rather than waiting for other people to solve our problems, why don’t we start solving them ourselves bottom-up instead of waiting for someone else to solve it top-down?

Anyhow I feel like I’m kind of being messy in suggesting this linked-data-lite idea. Is it heresy? My alibi/excuse is that I’ve been sitting in the same room as dchud for extended periods of time.