info-uris and opening up library data

I had a few moments to read the info-uri spec during a short flight from DC to Chicago this past weekend. info-uri aka RFC 4452 is a spec that allows you to create URIs for identifiers in public namespaces.

So what does this mean in practice and why would you want to use one?

If you have a database of stuff you make available on the web, and you have ids for the stuff (say a primary_key on a Stuff table) you essentially have an identifier in a public namespace. Go register the namespace!

So, the LoC assigns identifiers called Library of Congress Control Numbers (LCCN) to each of its metadata records. Here’s the personal-name authority record (expressed as MADS) that allows works by Tim Berners-Lee to be grouped together:

<?xml version='1.0' encoding='UTF-8'?>
<madsCollection
xmlns:xlink='http://www.w3.org/1999/xlink'
xmlns='http://www.loc.gov/mods/v3'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='http://www.loc.gov/mads

http://www.loc.gov/standards/mads/mads.xsd'>

<mads version='beta'>
<authority>
<name type='personal' authority='naf'>
<namePart>Berners-Lee, Tim</namePart>
</name>
<titleInfo authority='naf'>
<title/>
</titleInfo>
</authority>
<variant type='other'>
<name type='personal'>
<namePart>Lee, Tim Berners-</namePart>
</name>
<titleInfo>
<title/>
</titleInfo>
</variant>
<variant type='other'>
<name type='personal'>
<namePart>Berners-Lee, Timothy J</namePart>
</name>
<titleInfo>
<title/>
</titleInfo>
</variant>
<note type='source'>The WWW virtual library Web site,
Feb. 15, 1999 about the virtual library (Tim Berners-Lee; creator
of the Web)</note>
<note type='source'>OCLC, Feb. 12, 1999 (hdg.: Berners-Lee,
Tim; usage: Tim Berners-Lee)</note>
<note type='source'>Gaines, A. Tim Berners-Lee and the
development of the World Wide Web, 2001: CIP galley
(Timothy J. Berners-Lee; b. London, England, June 8, 1955)</note>
<recordInfo>
<identifier>no 99010609 </identifier>
<recordContentSource authority='marcorg'>NBL</recordContentSource>
<recordCreationDate encoding='marc'>990216</recordCreationDate>
<recordChangeDate encoding='iso8601'>20010716094452.0</recordChangeDate>
<recordIdentifier>1851704</recordIdentifier>
<languageOfCataloging>
<languageTerm authority='iso639-2b' type='code'>eng</languageTerm>
</languageOfCataloging>
</recordInfo>
</mads>
</madsCollection>

In the record/recordInfo/identifier element you can find the LCCN:

no 99010609

Which can be represented as an info-uri:

info:lccn/no9910609

Now why would you *ever* want to express a LCCN as an info-uri? The LoC has spent a lot of time and effort establishing these personal name and subject authorities. You might want to use a URI like info:lccn/no9910609 to identify Tim Berners-Lee as an individual in your data so that other people will know who you are talking about and be able to interoperate with you. For example you can now unambiguously say that Tim Berners-Lee created Weaving the Web

<info:lccn/no9910609> <http://purl.org/dc/elements/1.1/creator>
<info:lccn/99027665>

That was for you ksclarke :-) Pretty nifty eh? Now what’s really cool is that while info-uris aren’t necessarily resolvable (by design) OCLC does have the Linked Authority File, which allows you to look up these records. So tbl’s record can be found here:

http://errol.oclc.org/laf/no99-10609.html

I imagine that this is part of the joint OCLC/LoC/Die Deutsche Bibliothek project to build a Virtual International Authority File…but I’m not totally sure. At any rate there’s currently no way to drop a lccn info-uri in there and have it resolve to the XML–but that looks like an easy thing to add.

It feels like there is a real opportunity for libraries and archives to offer up their data to the larger web community. How can we make it easy for non-library folks to find and repurpose this data we’ve so assiduously collected over the years?

tbl is encouraging people to give themselves a URI…I wonder if he knew that he (and millions of others) already have one!

Addendum:

If you are interested section 6 of the RFC details the subtle rationale behind why the authors chose to create a new URI scheme rather than:

  1. using an existing URI scheme
  2. creating a new URN namespace

In essence they didn’t want to use an existing URI scheme because they all assume that you should be able to dereference the URI. An example of dereferencing in action can be found when clicking on a link like http://www.yahoo.com where the magic of DNS allows you to find yahoo’s web server and talk to it on port 80 in a predictable way. info-uris are designed to be agnostic as to whether or not the identifier can be dereferenced through a resolver of some kind.

Using URNs was thrown out since URNs are intended to persistently identify information resources and info-uris are designed to identify persistent namespaces not the resources themselves. Also the process of establishing a URN namespace isn’t for the faint of heart, which is evidenced by the short list of them. info-uris by contrast have a registrar who will expedite the process of registering a namespace, and have set up a framework for publishing validation/normalization rules. The current registrar is run by OCLCRLGBORG^w OCLC on behalf of NISO. So basically you don’t have to write an RFC to register your namespace.

Creative Commons License
info-uris and opening up library data by Ed Summers, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 4.0 International License.

3 thoughts on “info-uris and opening up library data

  1. I don’t think it is correct to say that “info-uris are designed to identify persistent namespaces not the resources themselves”. RFC 4452 refers to the use of info URIs to describe “information assets”, and says

    When referencing an information asset by means of its “info” URI, the asset SHALL be considered a “resource” as defined in RFC 3986

    And your examples above refer to info URIs for people, i.e. resources/things other than “persistent namespaces”.

    The examples of info URIs from the LCCN namespace does raise an interesting question. According to http://www.loc.gov/marc/lccn-namespace.html and http://info-uri.info/registry/OAIHandler?verb=GetRecord&metadataPrefix=reg&identifier=info:lccn/

    An LCCN is an identifier assigned by the Library of Congress for a
    metadata record (e.g., bibliographic record, authority record)

    which seems quite unambiguous that an LCCN (and an info URI in the LCCN namespace?) is an identifier for LoC’s metadata record. If that is the case, then I think using that same identifier for the subject of the metadata record (the person etc) contradicts that statement by LoC and introduces ambiguity about what asset/resource is identified. The person who created the LoC authority record describing the Notorious B.I.G.is a different person from the one who created the Notorious B.I.G. (Probably.)

    But I’m really of the school that says anything the info URI scheme provides could be achieved more easily and cheaply – still without writing an RFC to refister my namespace ;-) – using the http URI scheme e.g. as suggested here

    http://lists.w3.org/Archives/Public/www-rdf-interest/2003Oct/0000

    Cheers
    PeteJ

Leave a Reply