straw

By now I imagine you’ve heard the announcement that OCLC has started to make WorldCat bibliographic data available as openly licensed Linked Data. The availability of microdata and RDFa metadata in WorldCat pages coupled with the ODC-BY license and the availability of sitemaps for crawlers is a huge win for the library community. Similar announcements about Dewey Decimal Classification and the Virtual International Authority File are further evidence that there is a big paradigm shift going on at OCLC.

A few weeks ago Richard Wallis (formerly of Talis, and now at OCLC) asked me to take a look at the strawman library microdata vocabulary that OCLC put together for the WorldCat release: http://purl.org/library. Richard stressed that the library vocabulary was a prototype to focus and gather interest from the cultural heritage sector outside of OCLC, and the metadata community in general. Combined with the prototype microdata at WorldCat I think it represents an excellent first step. At this point I should re-iterate that these remarks about schema.org are mine and not those of my employer.

The vocabulary is actually currently expressed in OWL, and visiting that URL will redirect you to an application that lets you read the OWL file as documentation. Rather than write up a few paragraphs and send my comments to Richard in email, I figured I would jot them down here, in case anyone else has feedback.

Examining the classes that the library vocabulary defines tells the majority of the story. They are broken down into

ArchiveMaterial
Carrier
Computer File
Game
Image
Interactive Multimedia
Kit
Musical Score
Newspaper
Periodical
Thesis
Toy
Video
VideoGame
Visual Material
Web Site

These classes should seem familiar to catalogers who have worked in MARC since there is a lot of similarity with the types of data that are encoded into the 008 field. However some are missing such as maps, dictionaries, encyclopedias, etc. It’s kind of amusing that Book isn’t mentioned. I’m not sure what the rationale was for selecting these classes, perhaps some sort of ranking based on use in WorldCat? Examining the OWL shows that OCLC has made an effort to express mappings between the library vocabulary and schema.org:

library	schema.org
http://purl.org/library/ArchiveMaterial	http://schema.org/CreativeWork/ArchiveMaterial
http://purl.org/library/ComputerFile	http://schema.org/CreativeWork/ComputerFile
http://purl.org/library/Game	http://schema.org/CreativeWork/Game
http://purl.org/library/Image	http://schema.org/CreativeWork/Image
http://purl.org/library/InteractiveMultimedia	http://schema.org/CreativeWork/InteractiveMultimedia
http://purl.org/library/Kit	http://schema.org/CreativeWork/Kit
http://purl.org/library/MusicalScore	http://schema.org/CreativeWork/MusicalScore
http://purl.org/library/Newspaper	http://schema.org/CreativeWork/Newspaper
http://purl.org/library/Periodical	http://schema.org/CreativeWork/Periodical
http://purl.org/library/Thesis	http://schema.org/CreativeWork/Book/Thesis
http://purl.org/library/Toy	http://schema.org/CreativeWork/Toy
http://purl.org/library/Video	http://schema.org/CreativeWork/Video
http://purl.org/library/VideoGame	http://schema.org/CreativeWork/VideoGame
http://purl.org/library/VisualMaterial	http://schema.org/CreativeWork/VisualMaterial
http://purl.org/library/WebSite	http://schema.org/CreativeWork/WebSite

However these schema.org URLs do not resolve, and are not actually present as specifications of schema.org’s Creative Work. Perhaps the presence of these mappings in the library vocabulary is evidence of a desire to create these classes at schema.org. But then there are cases like library:Image which seem to bear a lot resemblance to schema.org’s ImageObject.

Examining the OWL also yields a set of library:Carrier instances.

BlurayDisk
CassetteTape
CD
DVD
FilmReel
LP
Microform
VHSTape
Volume
WWW

Again, there are more carriers than this in the MARC world. Why these were selected is a bit of a mystery. What library:WWW has to do with library:Website (if anything) isn’t clear, etc.

So even in this prototype library vocabulary there is a lot to examine and unpack. I imagine some phone calls or face to face meetings would be required to get at what went into their production.

Be that as it may, I think it could prove more useful to look at the WorldCat microdata and see what library vocabulary was used. For example here is the microdata extracted from the WorldCat page for Tim Berners-Lee’s Weaving the Web expressed as JSON:

{
  "type": "http://schema.org/Book", 
  "properties": {
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
      "http://schema.org/Book"
    ], 
    "http://purl.org/library/placeOfPublication": [
      {
        "type": "http://schema.org/Place", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://schema.org/Place"
          ], 
          "http://schema.org/name": [
            "San Francisco :"
          ]
        }
      }
    ], 
    "http://schema.org/bookEdition": [
      "1st ed."
    ], 
    "http://schema.org/publisher": [
      {
        "type": "http://schema.org/Organization", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://schema.org/Organization"
          ], 
          "http://schema.org/name": [
            "HarperSanFrancisco"
          ]
        }
      }
    ], 
    "http://schema.org/genre": [
      "History"
    ], 
    "http://schema.org/name": [
      "Weaving the Web : the original design and ultimate destiny of the World Wide Web by its inventor"
    ], 
    "http://schema.org/numberOfPages": [
      "226"
    ], 
    "http://purl.org/library/holdingsCount": [
      "2096"
    ], 
    "http://schema.org/about": [
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://schema.org/name": [
            "Erfindung."
          ]
        }
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://schema.org/name": [
            "WWW."
          ]
        }
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://schema.org/name": [
            "prospective informatique."
          ]
        }
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://www.w3.org/2004/02/skos/core#inScheme": [
            "http://dewey.info/scheme/e21/"
          ]
        }, 
        "id": "http://dewey.info/class/025/e21/"
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://www.loc.gov/mads/rdf/v1#isIdentifiedByAuthority": [
            "http://id.loc.gov/authorities/subjects/sh95000541"
          ], 
          "http://schema.org/name": [
            "World wide web."
          ]
        }
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://www.loc.gov/mads/rdf/v1#isIdentifiedByAuthority": [
            "http://id.loc.gov/authorities/subjects/sh95000541"
          ], 
          "http://schema.org/name": [
            "World Wide Web."
          ]
        }
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://www.loc.gov/mads/rdf/v1#isIdentifiedByAuthority": [
            "http://id.loc.gov/authorities/subjects/sh95000541"
          ], 
          "http://schema.org/name": [
            "World Wide Web--History."
          ]
        }
      }, 
      {
        "type": "http://schema.org/Person", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://schema.org/Person"
          ], 
          "http://www.loc.gov/mads/rdf/v1#isIdentifiedByAuthority": [
            "http://id.loc.gov/authorities/names/no99010609"
          ], 
          "http://schema.org/name": [
            "Berners-Lee, Tim."
          ]
        }, 
        "id": "http://viaf.org/viaf/85312226"
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://schema.org/name": [
            "Web--Histoire."
          ]
        }
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://schema.org/name": [
            "World Wide Web"
          ]
        }, 
        "id": "http://id.worldcat.org/fast/1181326"
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://schema.org/name": [
            "historique informatique."
          ]
        }
      }, 
      {
        "type": "http://www.w3.org/2004/02/skos/core#Concept", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://www.w3.org/2004/02/skos/core#Concept"
          ], 
          "http://schema.org/name": [
            "Web--Histoire."
          ]
        }
      }, 
      {
        "type": "http://schema.org/Person", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://schema.org/Person"
          ], 
          "http://schema.org/name": [
            "Berners-Lee, Tim"
          ]
        }
      }
    ], 
    "http://schema.org/description": [
      "Enquire within upon everything -- Tangles, links, and webs -- info.cern.ch -- Protocols: simple rules for global systems -- Going global -- Browsing -- Changes -- Consortium -- Competition and consensus -- Web of people -- Privacy -- Mind to mind -- Machines and the Web -- Weaving the Web."
    ], 
    "http://purl.org/library/oclcnum": [
      "41238513"
    ], 
    "http://schema.org/copyrightYear": [
      "1999"
    ], 
    "http://schema.org/contributor": [
      {
        "type": "http://schema.org/Person", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://schema.org/Person"
          ], 
          "http://www.loc.gov/mads/rdf/v1#isIdentifiedByAuthority": [
            "http://id.loc.gov/authorities/names/n97003262"
          ], 
          "http://schema.org/name": [
            "Fischetti, Mark."
          ]
        }, 
        "id": "http://viaf.org/viaf/874883"
      }
    ], 
    "http://schema.org/isbn": [
      "9780062515872", 
      "006251587X", 
      "0062515861", 
      "9780062515865"
    ], 
    "http://schema.org/inLanguage": [
      "en"
    ], 
    "http://schema.org/reviews": [
      {
        "type": "http://schema.org/Review", 
        "properties": {
          "http://schema.org/reviewBody": [
            "Tim Berners-Lee, the inventor of the World Wide Web, has been hailed by Time magazine as one of the 100 greatest minds of this century. His creation has already changed the way people do business, entertain themselves, exchange ideas, and socialize with one another.\" \"Berners-Lee offers insights to help readers understand the true nature of the Web, enabling them to use it to their fullest advantage. He shares his views on such critical issues as censorship, privacy, the increasing power of software companies in the online world, and the need to find the ideal balance between the commercial and social forces on the Web. His criticism of the Web's current state makes clear that there is still much work to be done. Finally, Berners-Lee presents his own plan for the Web's future, one that calls for the active support and participation of programmers, computer manufacturers, and social organizations to make it happen.\"--Jacket."
          ], 
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://schema.org/Review"
          ], 
          "http://schema.org/itemReviewed": [
            "http://www.worldcat.org/oclc/41238513"
          ]
        }
      }
    ], 
    "http://schema.org/author": [
      {
        "type": "http://schema.org/Person", 
        "properties": {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
            "http://schema.org/Person"
          ], 
          "http://www.loc.gov/mads/rdf/v1#isIdentifiedByAuthority": [
            "http://id.loc.gov/authorities/names/no99010609"
          ], 
          "http://schema.org/name": [
            "Berners-Lee, Tim."
          ]
        }, 
        "id": "http://viaf.org/viaf/85312226"
      }
    ]
  }, 
  "id": "http://www.worldcat.org/oclc/41238513"
}

Yes, that’s a lot of data. But interestingly only three library vocabulary elements were used:

placeOfPublication
holdingsCount
oclcnum

One could argue that rather than creating library:placeOfPublication they could use schema:publisher with a nested Organization item having a schema:location. Similarly library:oclcnum could’ve been expressed using itemid with a value of info:oclc/41238513 using the info-uri namespace that OCLC maintain the registry for. This leaves library:holdingsCount, which does seem to be missing from schema.org but also begs the question of whose holdings?

As Tom Gruber famously said:

Every ontology is a treaty – a social agreement – among people with some common motive in sharing.

So the question for me is what is the library vocabulary trying to do, and for who? Is it trying to make it easy to share MARC data as microdata on the Web? Is it trying to communicate something to search engines so that they can have enhanced displays? Who are the people that want to share and consume this data? I think having rough consensus about the answers to these questions is really important before diving into modeling exercises…even prototypes. And when the modeling begins I think it’s really important to follow the lead of the WorldCat developers in using the bits of schema.org vocabulary they could, and beginning to mint vocabulary terms for things that are missing. I don’t think it’s going to be fruitful to start from the position of modeling the bibliographic universe completely. I’d rather see real implementations (both publishers and consumers) drive the discovery of what is missing or awkward in schema.org, and how can it be fixed. Ideally, schema.org implementors like GoodReads would be at the table, along with members of the academic community like Jason Ronallo, Jonathan Rochkind and Ed Chamberlain (among others) who care about these issues. In addition my employer is actively engaged in an effort to rethink bibliographic data on the Web. It seems imperative that these efforts at schema.org and Zepheira’s work be combined somehow–especially since OCLC and Zepheira are hardly strangers.

I was of course flattered to be asked my opinion about the library vocabulary. I hope that my remarks haven’t accidentally set this strawman vocabulary on fire, because I think the work that OCLC has begun in this area is incredibly important. My experience watching the designers of SKOS has made me mindful of minimizing ontological commitments when designing a vocabulary, and wary of trying to exhaustively model a domain. In some ways I guess I’m a bit of a schema.org skeptic given its encyclopedic coverage. schema.org should take a page from the HTML 5 book and stay hyper-focused on letting implementations drive standardization. A bit of Seymour Lubetzky’s attention to simplification and user friendliness would be welcome as well.