research ideas for library linked data

The past few weeks have seen some pretty big news for Library Linked Data. On April 7th the Hungarian National Library announced that its entire library catalog, digital library holdings, and name/subject authority data are now available as Linked Data. Then just a bit more than a week later, on April 16th the German National Library announced that it was making its name and subject authority files available as Linked Data.

This adds to the pioneering work that the Royal Library of Sweden has already done in making all of its catalog and authority data available, which they announced almost two years ago now. Add to this that OCLC is also publishing the Virtual International Authority File as Linked Data, and that the Library of Congress also makes its subject authority data available as Linked Data and things are starting to get interesting.

About 16 months ago at the Dublin Core Conference in Berlin Alistair Miles predicted that we’d see several implementations of Linked Data at major libraries within the year. I must admit, while I was sympathetic to the cause, I was also pretty skeptical that this would come to pass. But here we are, just a bit past a year and two national libraries and a major library data distributor have decided to publish some of their data assets as Linked Data.

Hey Al, crow never tasted so good…

So now it’s starting to feel like there’s enough extant library Linked Data to start looking at patterns of usage, to see if there are any emerging best practices we could work towards. In particular I think it would be interesting to take a look at:

What vocabularies are being used, and is there emerging consensus about which to use?
What licenses (if any) are associated with the data?
How much linking and interlinking is going on?
What sorts of mechanisms does the publisher offer for getting the data: sitemap, feeds, SPARQL, bulk download?
What is the quality of the data: granularity, link integrity, vocabulary usage.
What approaches to identifiers for “real world things” have publishers taken: hash, slash, 303, PURLs, reuse of traditional identifiers, etc.
What are the relative sizes of the pools of library linked data?
How are updates being managed?

Tomorrow I’m meeting with some folks at the Metadata Research Center at the School of Information and Library Science at the University of North Carolina to talk about their HIVE project. Barbara Tillett and Libby Dechman of LC are also here to talk about the use of LCSH, VIAF and RDA. I’m hoping to convince some of the folks at the MRC that answering some of these questions about the use of Linked Data in libraries could be valuable to the library research community. The rumored W3C Incubator Group for Cultural Heritage Institutions and the Semantic Web couldn’t come at a better time.