8 thoughts on “lcsh, thesauri and skos

  1. I believe that transitivity is not really essential here: the fundamental issue is class inclusion vs. prototypes. For example, is “stone lion” a hyponym of “lion”? If we say no, we are hard put to it to understand what the relationship is; if we say yes, we have to abandon such obvious facts about lions as that they are made out of meat (hat tip to Terry Bisson here) and that they have parents that are also lions. Similar issues arise over “teddy bear” and “bear”, “ostrich” and “bird”, and “T-girl” and “girl” :-).

    If on the other hand we treat “lion” as a prototype category, then it’s easy to see that stone lions are lions that simply lack some of the prototypical lion properties while preserving others such as the mane, the tail, the jaws, and the pugnacious expression.

    The OO version of this problem has people deriving a ColoredPoint or 3DPoint class directly from a (2D, colorless) Point class, because it’s easy to add just one instance variable, though it should be obvious that a 3D point is not a 2D point. Instead, both should be derived from AbstractPoint, a class that is uncommitted to issues of dimensionality and color. Similarly we could have “abstract lion” as the hypernym of both “meat lion” and “stone lion”. But then how much do we factor out? It’s impossible to say a priori. By having a lion prototype object, we can clone it several times to create real or stone lions as appropriate by overriding prototype properties.

    And yet. Class inclusion is so handy when it does work, so powerful and expressive, it’s hard to think of abandoning it altogether.

  2. Spot-on, John.

    Cocktails aside, “alcoholic beverages” and “non-alcoholic beverages” are clearly disjoint sets, can this be exploited somehow?

    The fact that “non-alcoholic cocktails” has a path (indirectly) to both might be used to derive a score for edges in the graph, hinting at how strict the parental relationships are. We can derive from your graph that “cocktails share many properties of alcoholic drinks, but not all of them”, and this could be presented as an annotation on the edge between cocktails and alcoholic beverages.

    A simple score model might be to count the number of edges in a node’s subtree which link to a peer of the current node, and apply this value to the edge connecting the current node to its parent. The distance of the subnode from the current node should probably be factored into the score. So, cocktails-to-alcoholic beverages might get a weight of 0.5, since it has a child that refers to Non-alcoholic beverages.

    I’m not a graph expert by any means, and this weighting approach may be naive, but at least it is something that can be derived programatically, and can be presented visually (e.g. the thickness of the edges could depend on their scores), and that might help others in studying relationships in the graph, especially when viewing subgraphs like the one in your post. (For example, “Beer” isn’t on the graph, but its influence could be implied by edge annotation.)

  3. You stubled upon the classical diamond problem of object oriented knowledge modelling. Just forget about transivity and secondary differences between IS-A and HAS-A. In a general thesaurus there is only a broader/narrower relationship, everything else depends on your specific use case and can be discussed.

  4. Thanks for the helpful comments John, Graham and Jakob. I agree, I kind of muddied the waters focusing on transitivity in this post. The distinction between class inclusion vs prototypes is what I was after, and I appreciate the clarification.

    The good news is there is nothing preventing SKOS from being extended in a way to capture these two specializations of skos:broader…the bad news is that, well you have to extend SKOS, and multiple communities might do it totally differently. This is the double-edged sword of trying to serve multiple communities.

    Graham, if memory serves Alistair Miles’ thesis contains some details about the weighting of links between concepts along similar lines to what you suggested. I’m not a graph expert either, so these suggestions are most welcome.

  5. And, damn.

    Dykstra says “We librarians have lived with LCSH as a liability for a long time. The matter now, however, must no longer be lived with, for it has become a professional disgrace.”

    OUCH. She said that in 1988. Read her essay. Pretty much everything she complains about is still with us 20 years later. 20 YEARS. That’s an awful long time to still be living with what Dykstra was not afraid to call a professional disgrace. Ouch ouch ouch.

Leave a Reply