bibliographic records on the web

There are a couple interesting threads (disclaimer I inadvertently started one) going on over on the Open Library technical discussion list about making Linked Data views available for authors. Since the topic was largely how to model people, part of the discussion spilled over to foaf-dev (also my fault).

When making library Linked Data available my preference has been to follow the lead of Martin Malmsten, Anders Söderbäck and the Royal Library of Sweden by modeling authors as People using the FOAF vocabulary:

<http://libris.kb.se/resource/auth/317488>
    libris:key "Berners-Lee, Tim" ;
    a foaf:Person ;
    rdfs:isDefinedBy <http://libris.kb.se/data/auth/317488> ;
    skos:exactMatch <http://viaf.org/viaf/23002995> ;
    foaf:name "Berners-Lee, Tim", "Lee, Tim Berners-", "Tim Berners- Lee", "Tim Berners-Lee" .

It seems sensible enough right? But there is some desire in the library community to model an author as a Bibliographic Resource and then relate this resource to a Person resource. While I can understand wanting to have this level of indirection to assert a bit more control, and to possibly use some emerging vocabularies for RDA, I think (for now) using something like FOAF for modeling authors as people is a good place to start.

It will engage folks from the FOAF community who understand RDF and Linked Data, and get them involved in the Open Library Project. It will make library data fit in with other Linked Data out on the web. Plus, it just kind of fits my brain better to think of authors as people…isn’t that what libraries were trying to do all along with their authority data? I’m not saying that FOAF will have everything the library world needs (it won’t), but it’s an open world and we can add stuff that we need, collaborate, and make it a better place.

Anyway, that’s not really what I wanted to talk about here. Over the course of this discussion Erik Hetzner raised what I thought was an important question:

Are you saying that there is a usable distinction between:

a bibliographic record, and

the data contained in that bibliographic record?

From above, my first notion would be to model things as, in
pseudo-Turtle::

<Victor Hugo> a frbr:Person .
<Victor Hugo> rdfs:isDefinedBy <bib record> .
<bib record> dc:modified “…”^^xsd:date .

But it seems to me that you are adding a further distinction::

<Victor Hugo> a frbr:Person .
<Victor Hugo> rdfs:isDefinedBy <bib record> .
<bib record> rdfs:isDefinedBy <bib record data>
<bib record data> dc:modified “…”^^xsd:date .

Is this a usable or useful distinction? Are there times when we want to distinguish between the abstract bibliographic record and the representation of a bibliographic record? In linked data-speak, is a bibliographic record a non-information resource? My thinking has been that a bibliographic record is an information resource, and that one does not need to distinguish between (1) and (2) above.

I think it’s an important question because I don’t think it’s been really discussed much before, and has a direct impact on what sort of URL you can use to identify a Bibliographic Record, and what sort of HTTP response a client gets when it is resolved. This is the httpRange-14 issue, which is covered in Cool URIs for the Semantic Web. If a Bibliographic Record is an Information Resource then its OK to identify the record with any old URL, and for the server to say 200 OK like normal. If it’s not an Information Resource then the URL should either have a hash fragment in it, or the server should respond 303 See Other, and redirect to another location.

In my view if a Bibliographic Record is on the web with a URL, it is useful to think of it as an Information Resource…or (as Richard Cyganiak dubs it) a Web Document. I don’t think it’s worthwhile philosophizing about this, but instead to think about it pragmatically. I think it’s useful to consider

http://lccn.loc.gov/99027665

as being an identifier for a bibliographic record that happens to be in HTML. Likewise

are all identifiers for Bibliographic Records in MODS, Dublin Core and MARCXML respectively. It might be useful to link them together as they are with <link> elements in the HTML, or in some RDF serialization. It also could be useful to treat one as canonical, and content negotiate from one of the URLs (e.g. curl –header “Accept: application/marc+xml” http://lccn.loc.gov/99027665). But I think it simplifies deployment of library Linked Data to think of bibliographic records as things that can be put on the web as documents, without worrying too much about httpRange-14. A nice side effect of this is that it would grandfather in all the OPAC record views out there. Maybe it’ll be useful to distinguish between an abstract notion of a bibliographic record, and the actual document that is the bibliographic record – but I’m not seeing it right now…and I think it would introduce a lot of unnecessary complexity in this fragile formative period for library Linked Data.