lcsh, thesauri and skos

Simon Spero has an interesting post on why LCSH cannot be considered a thesaurus. At $work I’ve been working on mapping LCSH/MARC to SKOS, so Simon’s efforts in both collecting and analyzing LCSH authority data have been extremely valuable. In particular Simon and Leonard Willpower’s involvement with SKOS alerted me relatively early on to some of the problems that lie in store when thinking of LCSH in terms of a thesaurus.

The problem stems from very specific (standardized) notions of what thesauri are. Z39-19-2005 defines broader relationships in thesauri as being transitive. So if a has the broader term b, and b has the broader term c, then you can infer a has the broader term c.

Now consider the broader relationships (BT for those of you w/ the red books handy, or care to browse authorities.loc.gov from the comfort of your chair) from the heading “Non-alcoholic cocktails”:

If broader relationships are to be considered transitive one is obliged to treat Alcoholic beverages as a broader term for Non-alcoholic cocktails. But clearly it’s nonsense to consider a non-alcoholic cocktail a specialization of an alcoholic beverage. As Simon pointed out the problem was recognized by Mary Dykstra soon after LCSH adopted terminology from the thesauri world (BT, NT, RT) in 1986. Her article, LC Subject Headings Disguised as a Thesaurus describes the many difficulties of treating LCSH as a thesaurus. In the example above from LCSH the broader (BT) relationship is used for both hierarchical (IS-A) relationships, as well as part/whole (HAS-A) relationships. According to thesauri folks this is a no-no.

LCSH aside, the semantics of broader/narrower have been an issue for SKOS for a fair amount of time. Guus Schreiber proposed a resolution, which was just accepted at yesterday’s SWD telecon. SKOS is trying to straddle several different worlds, enabling the representation of a range of knowledge organization systems from thesauri and taxonomies to subject heading lists, folksonomy and other controlled vocabularies. To remain flexible in this way, while still appealing to the thesaurus world a compromise was reached where the skos:broader and skos:narrower semantic relations were declared to be sub-properties of two new properties: skos:broaderTransitive and skos:narrowerTransitive (respectively). Since transitivity is not inherited, SKOS can still be used by people who want to represent loose broader relationships (LCSH, and others). At the same time SKOS will allow vocabulary owners to infer transitive broader/narrower relationships across concepts. Incidentally the SKOS Reference was just approved yesterday as a W3C Working Draft, which is its first step along the way to hopefully becoming a Recommendation.

My pottering about with LCSH and SKOS has also illustrated the value in making links between concepts explicit. Modeling LCSH as a graph data structure (SKOS), where each concept has a unique identifier has been a simple and yet powerful step in working with the data. For example to generate the image above, I simply wrote a script that transformed the subgraph related to “Non-alcoholic cocktails” to a graphviz dot file:


digraph G {
  rankdir = "BT"
  "Non-alcoholic cocktails" -> "Cocktails";
  "Alcoholic beverages" -> "Beverages";
  "Non-alcoholic beverages" -> "Beverages";
  "Cocktails" -> "Alcoholic beverages";
  "Non-alcoholic cocktails" -> "Non-alcoholic beverages";
  "Non-alcoholic beer" -> "Non-alcoholic beverages";
}

And then ran that through the graphviz dot utility:


% dot -T png non-alcoholic-cocktails.dot > non-alcoholic-cocktails.png

to generate the PNG file you see. It’s my hope that making a richly linked graph like LCSH/SKOS available will enable not only enhanced use of the vocabulary, but also aid in creative, collaborative refactoring of the graph. I know that these issues are not new to LC, however tools that enable refactoring along the lines of what Margherita Sini proposed for the cocktail problem above will only be possible in a world where the graph can easily be manipulated and, downstream applications (library catalogs, etc) can easily adapt to the changing concept scheme.