While lcsh.info was up and running harvesters actively crawled it. At its core all lcsh.info did was mint a URI for every Library of Congress Subject Heading. This is similar in spirit to Brewster Kahle’s more ambitious OpenLibrary project to mint a URI for every book, or in his words:
One web page for every book
Aside: It’s also similar in spirit to RESTful web development, and to the linked data, semantic web effort generally.
Minting a URI for every Library of Congress Subject Heading meant that there were lots of densely interlinked pages. Some researchers at Stanford did a data visualization of LCSH two years ago, which illustrates just how deeply linked LCSH is:
I wanted lcsh.info to get crawled so I intentionally put some high level, well connected concepts (Humanities, Science, etc) on the home page to provide a doorway for web crawlers to walk through into the site and begin discovering all the broader, narrower, related links between concepts–without having to perform a search.
So lcsh.info is down now, but it turns out you can still see its shadow living on in quite a usable form in web search engines. For example type this into any of the big three search engines:
And you’ll see:
It’s interesting that (unlike Google and Yahoo) Microsoft’s relevancy ranking actually puts the heading for “Mathematics” at the top. Also note that simple things like giving the page a good title, and descriptive text make the heading show up in usable form in each search engine.
It’s not too surprising that trying the same for authorities.loc.gov doesn’t work out so well. Umm, yeah http://authorities.loc.gov/robots.txt…
On the one hand, I’m just being nostalgic looking at the content that once was there &sigh;. But on the other there seems to be a powerful message here, that putting data out onto the open web, and making it crawlable means your content is viewable via lots of different lenses. Maybe you don’t have to get search exactly right on your website, let other people do it for you.
Two other things come to mind: LOCKSS and Brewster’s even more ambitious project. I’ve been sort hoping that somehow or another the Internet Archive and the Open Library would find there way into being publicly funded projects. What if? I can daydream right?
I just saw Diane Vizine-Goetz demo OCLC’s Terminology Services at the CENDI/SKOS meeting and was excited to see various things out on the public web. For example, the LCSH concept “World Wide Web” is over here:
At the moment it’s not the most friendly human readable display, but that’s just a XSLT stylesheet away (assuming TS follows the patterns of other OCLC Services). I’m not quite sure what the default namespace urn:uuid:D30A7E67-31BF-40A3-9956-9668674FCD84 is. But the response looks like it indicates what resources are related to a given conceptual resource.
And LCSH is just one of the vocabularies available through the pilot service, if you examine the XML you’ll see references to FAST, TGM and MESH + SRU services for each.
curl --header "Accept: text/html" http://tspilot.oclc.org/lcsh/sh2008114004
But not application/rdf+xml:
curl --header "Accept: application/rdf+xml" http://tspilot.oclc.org/lcsh/sh2008114004
It seems like it would be a pretty easy fix, and pretty important for being able to follow your nose on the semantic web.
Via Ivan Herman I learned that the Semantic Web Use Cases use concepts from lcsh.info. For example look at the RDFa in this case study for the Digital Music Archive for the Norwegian National Broadcaster. You can also look at the Document metadata in a linked data browser like OpenLink. Click on the “Document” and then on the various subject “concepts” and you’ll see the linked data browser go out and fetch the triples from lcsh.info for “Semantic Web” and “Broadcasting”.
One of the downsides to linked-data browsers (for me) is that they hide a bit of what’s going on. Of course this is by-design. For a more rdf centric view on the data take a look at this output of rapper.
ed@curry:~$ rapper -o turtle http://www.w3.org/2001/sw/sweo/public/UseCases/NRK/
rapper: Serializing with serializer turtle
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dc:rights "\u00A9 Copyright 2007, ESIS, NRK."@en-us;
dc:subject <http://lcsh.info/sh85017004#concept>, <http://lcsh.info/sh2002000569#concept>;
dc:title "Case Study: A Digital Music Archive (DMA) for the Norwegian National Broadcaster (NRK) using Semantic Web techniques"@en-us;
foaf:name "Dr. Robert H.P. Engels"@nl
foaf:name "Jon Roar T\u00F8nnesen"@no
xhv:stylesheet <http://www.w3.org/2001/sw/sweo/public/UseCases/style/ucstyle.css> .
You can see Ivan’s using the dc, foaf, skos and bibo vocabularies, and the links out lcsh Concepts. Fun stuff.