info-uris and opening up library data
I had a few moments to read the info-uri spec during a short flight from DC to Chicago this past weekend. info-uri aka RFC 4452 is a spec that allows you to create URIs for identifiers in public namespaces.
So what does this mean in practice and why would you want to use one?
If you have a database of stuff you make available on the web, and you have ids for the stuff (say a primary_key on a Stuff table) you essentially have an identifier in a public namespace. Go register the namespace!
So, the LoC assigns identifiers called Library of Congress Control Numbers (LCCN) to each of its metadata records. Here’s the personal-name authority record (expressed as MADS) that allows works by Tim Berners-Lee to be grouped together:
<?xml version='1.0' encoding='UTF-8'?> <madsCollection xmlns:xlink='http://www.w3.org/1999/xlink' xmlns='http://www.loc.gov/mods/v3' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.loc.gov/mads http://www.loc.gov/standards/mads/mads.xsd'> <mads version='beta'> <authority> <name type='personal' authority='naf'> <namePart>Berners-Lee, Tim</namePart> </name> <titleInfo authority='naf'> <title/> </titleInfo> </authority> <variant type='other'> <name type='personal'> <namePart>Lee, Tim Berners-</namePart> </name> <titleInfo> <title/> </titleInfo> </variant> <variant type='other'> <name type='personal'> <namePart>Berners-Lee, Timothy J</namePart> </name> <titleInfo> <title/> </titleInfo> </variant> <note type='source'>The WWW virtual library Web site, Feb. 15, 1999 about the virtual library (Tim Berners-Lee; creator of the Web)</note> <note type='source'>OCLC, Feb. 12, 1999 (hdg.: Berners-Lee, Tim; usage: Tim Berners-Lee)</note> <note type='source'>Gaines, A. Tim Berners-Lee and the development of the World Wide Web, 2001: CIP galley (Timothy J. Berners-Lee; b. London, England, June 8, 1955)</note> <recordInfo> <identifier>no 99010609 </identifier> <recordContentSource authority='marcorg'>NBL</recordContentSource> <recordCreationDate encoding='marc'>990216</recordCreationDate> <recordChangeDate encoding='iso8601'>20010716094452.0</recordChangeDate> <recordIdentifier>1851704</recordIdentifier> <languageOfCataloging> <languageTerm authority='iso639-2b' type='code'>eng</languageTerm> </languageOfCataloging> </recordInfo> </mads> </madsCollection>
In the record/recordInfo/identifier element you can find the LCCN:
Which can be represented as an info-uri:
Now why would you ever want to express a LCCN as an info-uri? The LoC has spent a lot of time and effort establishing these personal name and subject authorities. You might want to use a URI like info:lccn/no9910609 to identify Tim Berners-Lee as an individual in your data so that other people will know who you are talking about and be able to interoperate with you. For example you can now unambiguously say that Tim Berners-Lee created Weaving the Web
<info:lccn/no9910609> <http://purl.org/dc/elements/1.1/creator> <info:lccn/99027665>
That was for you ksclarke :-) Pretty nifty eh? Now what’s really cool is that while info-uris aren’t necessarily resolvable (by design) OCLC does have the Linked Authority File, which allows you to look up these records. So tbl’s record can be found here:
I imagine that this is part of the joint OCLC/LoC/Die Deutsche Bibliothek project to build a Virtual International Authority File…but I’m not totally sure. At any rate there’s currently no way to drop a lccn info-uri in there and have it resolve to the XML–but that looks like an easy thing to add.
It feels like there is a real opportunity for libraries and archives to offer up their data to the larger web community. How can we make it easy for non-library folks to find and repurpose this data we’ve so assiduously collected over the years?
tbl is encouraging people to give themselves a URI…I wonder if he knew that he (and millions of others) already have one!
- Nikola Tesla: info:lccn/n7887404
- Madonna: info:lccn/n84156128
- Notorious B.I.G.: info:lccn/no9631850
- Henriette Avram: info:lccn/n5029954
If you are interested section 6 of the RFC details the subtle rationale behind why the authors chose to create a new URI scheme rather than:
- using an existing URI scheme
- creating a new URN namespace
In essence they didn’t want to use an existing URI scheme because they all assume that you should be able to dereference the URI. An example of dereferencing in action can be found when clicking on a link like http://www.yahoo.com where the magic of DNS allows you to find yahoo’s web server and talk to it on port 80 in a predictable way. info-uris are designed to be agnostic as to whether or not the identifier can be dereferenced through a resolver of some kind.
Using URNs was thrown out since URNs are intended to persistently identify information resources and info-uris are designed to identify persistent namespaces not the resources themselves. Also the process of establishing a URN namespace isn’t for the faint of heart, which is evidenced by the short list of them. info-uris by contrast have a registrar who will expedite the process of registering a namespace, and have set up a framework for publishing validation/normalization rules. The current registrar is run by OCLCRLGBORG^w OCLC on behalf of NISO. So basically you don’t have to write an RFC to register your namespace.