Skip to content

work identifiers and the web

Michael Smethurst’s In Search of Cultural Identifiers post over at the BBC Radio Labs got me thinking about web identifiers for works, about LibraryThing and OCLC as linked library data providers, and finally about the International Standard Text Code. Admittedly it’s kind of a hodge-podge of topics, and I’m going to taking some liberties with what ‘linked data’ and ‘works’ mean, so bear with me.

Both OCLC Worldcat and LibraryThing mint URIs for bibliographic works, like these for Wide Sargasso Sea:

So the library community really does have web identifiers for works–or more precisely web identifiers for human readable records about works. What’s missing (IMHO) is the ability to use that identifier to get back something meaningful for a machine. Tools like Zotero need to scrape the screen to pull out the data points of interest to citation management. Sure, if you want you can implement COinS or unAPI to allow the metadata to be extracted, but could there be a more web-friendly way of doing this?

Consider how blog syndication works on the web. You visit a blog (like this one) and your browser is able to magically figure out the location of an RSS or Atom feed for the blog, and give you an option to subscribe to it.

Well it’s not really magic it’s just a bit of markup in the HTML:

<link rel="alternate" 
         type="application/rss+xml" 
         title="inkdroid RSS Feed" 
         href="http://inkdroid.org/journal/feed/" />

Simple right?

Now back to work identifiers. Consider that both Worldcat and LibraryThing have web2.0 apis for retrieving machine readable data for a work:


http://www.librarything.com/services/rest/1.0/?method=librarything.ck.getwork&id={work_id}&apikey={your_key}

or:


http://www.worldcat.org/webservices/catalog/content/{oclc_number}?wskey={key}

What if the web pages for these resources at OCLC and LibraryThing linked directly to these machine readable versions? For example if the page for Wide Sargasso Sea at LibraryThing contained this in its <head> element:

<link rel="alternate" 
         type="application/xml" 
         title="XML for Wide Sargasso Sea" 
         href="http://www.librarything.com/services/rest/1.0/?method=librarything.ck.getwork&id=27239&apikey=d231aa37c9b4f5d304a60a3d0ad1dad4" />

This would allow browsers, plugin tools like Zotero and web crawlers to follow the natural grain of the web and discover the machine readable representation. Admittedly this is something that COinS and unapi are designed to do. But the COinS and unAPI protocols are really optimized for making citation data, and non web identifiers available and routable via a resolver of some kind. Maybe I’m just over reaching a bit, but this approach of using the <link> header seems to embrace the notion that there are resources within the Worldcat and Librarything websites, and there can be alternate representations of those resources that can be discovered in a hypertext-driven way.

Of course there is the issue of the API key. In the example above I used the demo key in LibraryThing’s docs. More important in the context of web identifiers for works is the need to distinguish between the identifier for the record, and the identifier for the concept of the work, which is most elegantly solved (IMHO) by following a pattern from the Cool URIs for the Semantic Web doc. But I think it’s important that people realize that it’s not necessary to jump headlong into RDF to start leveraging some of the principles behind the Architecture of the World Wide Web. Henry Thompson has a nice web-centric discussion of this issue in his What’s a URI and Why Does it Matter?

While writing this blog post I noticed a thread over on Autocat that Bowker has been named the US Registrar for the International Standard Text Code. The gist is that the ISTC will be a “global identification system for textual works”, and that registrars (like Bowker) will mint identifiers for works, such as:

ISTC 0A9 2002 12B4A105 7

Where the structure of the identifier is roughly:

ISTC {registration agency} {year element} {work element} {check digit}

It’s interesting that the meat of the ISTC is the work element that is:

… assigned automatically by the central ISTC registration system after a metadata record has been submitted for registration and the system has verified that the record is unique;

The metadata record in question is actually a chunk of ONIX, which presumably Bowker will send to the ISTC central registrar, and get back a work id.

This work that the ISTC is taking on is really important–and one would imagine quite costly. One thing I would suggest to them is that they may want to make the ISTC codes have a URI equivalent like:

http://istc-international/0A9/2002/12B4A1057

They also should encourage Bowker and other registrars to publish their work identifiers on the web:

http://bowker.com/istc/0A9/2002/12B4A1057

It seems to me that we might (in the long term) be better served by a system that embraces the distributed nature of the web. A web in which organizations like Bowker, ISTC, OCLC, LibraryThing, Library of Congress and national libraries publish their work identifiers using URIs, and return meaningful metadata for them. Rather than waiting for other people to solve our problems, why don’t we start solving them ourselves bottom-up instead of waiting for someone else to solve it top-down?

Anyhow I feel like I’m kind of being messy in suggesting this linked-data-lite idea. Is it heresy? My alibi/excuse is that I’ve been sitting in the same room as dchud for extended periods of time.

9 Comments

  1. I am interested that you do not include the Library of Congress or national libraries in your list of organizations ….

    Wednesday, January 21, 2009 at 3:43 pm | Permalink
  2. ed wrote:

    @lorcan you are absolutely right. It wasn’t intentional. I work at LC so it’s a bit of a blind spot for me. The lccn.loc.gov service actually does implement the linked-data-lite pattern outlined in this post. For example if you look at http://lccn.loc.gov/2002405946 you’ll see a variety of link elements pointing at other xml documents.

    Of course the lccn.loc.gov only provides access to traditional bibliographic records, aka manifestations…so they aren’t the Works that you all have toiled so hard to create in your FRBRizations.

    Wednesday, January 21, 2009 at 7:12 pm | Permalink
  3. Andy Powell wrote:

    Re: “One thing I would suggest to them is that they may want to make the ISTC codes have a URI equivalent “. I would go further… why create a new ITSC as a string of characters and then create a URI equivalent? Why not simply mint new ITSCs directly as ‘http’ URIs?

    Thursday, January 22, 2009 at 5:50 am | Permalink
  4. ed wrote:

    @andypowe11 absolute agreement. It seems like that particular leap of faith is hard for some people ; and for them perhaps its better to say they can have their cake and eat it too? /me shrugs

    Thursday, January 22, 2009 at 8:34 am | Permalink
  5. yann nicolas wrote:

    Just a doubt about what http://www.worldcat.org/oclc/24630204 actually refers to ?
    Does it refer to the Work or one privileged manifestation among the manifestations of that work ?

    Thursday, January 22, 2009 at 1:14 pm | Permalink
  6. ed wrote:

    @yann yes, a good amount of skepticism here is definitely warranted. I think calling the resource at that URI a Work (FRBR) may be wrong. I was told the same thing by Jonathan Rochkind in #code4lib. The links out to various editions is what made me think perhaps it was a Work, or perhaps an Expression. Perhaps http://www.worldcat.org/oclc/24630204/editions works better? Maybe the story with LibraryThing is clearer?

    I guess my point is that the library community could be attempting to treat web applications, and the resources within them, as instances of the vocabularies we aspire to use. Lets use the web as a medium for using our descriptive languages like FRBR, RDA, etc. When we create a web applications (OPACS, etc), what are the resources in it that we are making available?

    Perhaps I’m just saying what some people have been saying for years in forums like ngc4lib. But I feel like the REST architectural style and notions of linked data can help the library community grapple what it means to distribute its cataloging data on the web.

    Friday, January 23, 2009 at 6:43 am | Permalink
  7. dchud wrote:

    I really agree with this that you said, Ed: “treat web applications, and the resources within them, as instances of the vocabularies we aspire to use.”

    Just some quick nits, not to distract from your main point, though: COinS isn’t a protocol (OpenURL is), and unAPI isn’t optimized for anything in particular.

    In any case, the only main gap I see with using the link header for more is how much leverage it provides to be specific about relationships between distinct URIs. @type is limited to MIME, and @rel/@rev are limited to the dozen or so specified in HTML, unless you get into defining profiles. And I can’t say I’ve much experience with profiles, and I don’t know if many other people do.

    With unAPI we just left the format[@name] open to address this issue, to allow more human-readable specificity next to the @type value. It’d be a stretch to say there has been enough unAPI usage to draw any useful conclusions about how the @name values tend to be used, though.

    So that’s one of the biggest open questions for me, but I’d like to help work on an answer.

    Sunday, January 25, 2009 at 3:05 pm | Permalink
  8. dchud wrote:

    Hmm, just a quick extra thought about that last comment… in the interest of “using what’s already there”, we could come up with a convention for using the link[@rel="contents"] att/value pair to point to a human-readable HTML contents page which doubles as a “resource map”… with its own link[@rel="alternate", @type="application/xml+ore"] or whatever behind it.

    I wonder if anybody is already doing this?

    Sunday, January 25, 2009 at 3:11 pm | Permalink
  9. hvdsomp wrote:

    With my team here in Los Alamos, I have done something that comes very close to what Ed describes in an ORE experiment. If I remember correctly, I presented the experiment at the ORE Open Meeting at John Hopkins University, March 2008. A QuickTime recording of the experiment is at http://public.lanl.gov/herbertv/images/cite_no_manager.mov . I am afraid this movie (90 Mb) has no voice over, and the experiment is a tad more elaborate than what Ed describes. Actually what Ed describes is, in this experiment, an enabler for a Web-based scholarly authoring tool that automatically inserts references to cited articles as follows:

    1. The author links some text of his new article to (the HTTP URI of) the splash page of a to-be-cited article
    2. The to-be-cited article splash page has the HTML LINK mechanism suggested by Ed that points at an ORE Resource Map for the article. One of the aggregated resources in the Resource Map is a bibliographic record, appropriately typed so that it can be recognized as being a biblio record.
    3. Once the author is confident with the text, the Save button is clicked and a background process goes out to all links in the authors’ text in search of biblio info, i.e. in search of HTML LINKs that point at ReMs that contain biblio info. That’s the crawling biblio info part in Ed’s post.
    4. If biblio info is found, it is inserted in the reference list and tada …

    Basically, this experiment is about using the Web as the database for collecting article citations on a per need basis, instead of using a desktop tool with a local database.

    Anyhow, the experiment was done in the days that an ORE Resource Map was still expressed as an Atom Feed. In ORE 1.0, a Resource Map is expressed as an entry, and the HTML LINK element that one would use now to point at such an entry is:

    <link href="http://example.net/hw.atom"
             type="application/atom+xml;type=entry" 
             rel="resourcemap" />

    (see http://www.openarchives.org/ore/1.0/discovery#HTMLLinkElement)

    That is, of course, if one were to use ORE to achieve this all, as I did in the experiment.

    Monday, January 26, 2009 at 3:43 pm | Permalink

Post a Comment

You must be logged in to post a comment.