stepping backwards

Jonathan Rochkind recently wrote a good blog post about using HTML5 Microdata to help citation managers like Mendeley and Zotero discover citation metadata that is available in formats such as RIS. It’s an excellent and detailed complement to Eric Hellman’s piece on the same subject.

I contributed to the unAPI effort 5 years ago, which aimed to fix the same problem: making citation metadata available to browsers. I wrote the unAPI validator which helped implementors confirm they were doing things right, articles were written, and we saw implementations in software such as the opensource integrated library system Evergreen and the popular citation manager Zotero, which at one point looked first for unAPI metadata in pages…perhaps it still does.

As Jonathan points out, there are some issues with unAPI, such as accessibility problems around Microformats in general, which unAPI was partly modeled on. HTML5 Microdata and RDFa weren’t around when we were working on unAPI, so I think Jonathan is right that it definitely makes sense to think about using these technologies nowadays instead of unAPI when making structured metadata available in HTML. I personally think the same thing goes for COinS where OpenURL key value pairs are used to express the metadata. Companies like Google, Microsoft, Yahoo and Facebook actively scrape HTML5 Microdata and RDFa, and there are vocabularies for describing books and articles. And because these technologies are deployed wider than the small niche that libraries occupy, they fit the Web better.

But there is a fair bit of turmoil in the structured-data-on-the-Web space. Today’s F8 product announcements seemed to indicate that Facebook is deepening its use of the OpenGraphProtocol, which is their rebranding of RDFa. We’re seeing the International Press Telecommunications Council standardizing rNews as an RDFa vocabulary for expressing online news metadata. And meanwhile Google, Microsoft and Yahoo are continuing to work on schema.org Microdata vocabularies. The recent Schema.org Workshop seems to anticipate significant changes in that space in the near future, particularly regarding the output of the W3C Web Schema and HTML Data task forces.

At LODLAM-DC we had a good conversation about RDFa, Microdata, Microformats and JSON publishing options for the cultural heritage sector. Perhaps I was just projecting, but it seemed like there was a fair bit of uncertainty about which to use. At the end of the day it seems like making your decisions based on things you want to enable is a good way forward. Are you trying to get your content to show up nicely on Facebook or Google–or both?

…or are you trying to do something else, like advertise some RIS citation metadata that is related to an HTML page so a citation manager can pick it up?

Even before the pixels had dried on the first version of the unAPI spec I was left with the nagging feeling that it had missed the point. I felt like we hadn’t really used the mechanics of the Web that were already there, and had sort of inadvertently succumbed to how standards development would be lampooned later by XKCD:

Specifically, I felt like we could have documented an even simpler pattern, namely using a <link> or <a> elements in conjunction with the rel and type parameters. So if you have a search result that is available as RIS, why not add this to your <head> element:

<link rel="alternate" type="application/x-research-info-systems" href="/search?q=cartoons&format=ris" />

My IRC conversation with Jonathan about his blog post was rolling around in my head when this Kurt Vonnegut quote went by in my Twitter stream:

It seemed oddly appropriate given the uncertainty in the structured-data-on-the-web marketplace, and some missteps with unAPI. If all we want to do is replace unAPI with something easier and more web-friendly, then why not fall back on basic functionality that has been in HTML for years?

If you want to make structured metadata available directly in HTML, sure HTML5 Microdata and RDFa are important technologies to use. But if all you want to do is link to an external metadata file I personally think the scholarly community would be better served by a simpler and less controversial approach.

Creative Commons License
stepping backwards by Ed Summers, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 4.0 International License.

7 thoughts on “stepping backwards

  1. I think it’s really important to give people a reason to create data, and an immediate ROI for doing so…

    eg. javascript & PHP libraries which augment your site, validators, services which consume your data and do something neat with it which you can try out right away.

    Unless the company gets something in return, why waste the time and bandwidth on this stuff the academics waffle on about but doesn’t make you money? (we need quick easy-to-understand answers to that question)

  2. You wrote: “At the end of the day it seems like making your decisions based on things you want to enable is a good way forward. Are you trying to get your content to show up nicely on Facebook or Google–or both? …or are you trying to do something else, like advertise some RIS citation metadata that is related to an HTML page so a citation manager can pick it up?” That’s the point. Most discussions about which technologies to use, miss a clear goal. In the end most standards are just defined by concrete applications: we want to make data accessible to to Mendeley and Zotero, Google, Facebook etc. – so we must give it in forms they want, no matter how ill-designed these forms are.

  3. To expand Jakob’s comment, I always recommend trying to imagine being a freelance or internal webdev: why would I go to my client and suggest they pay me to add this feature? I can make that case for SEO, for Google/Facebook/Twitter sharing, etc. For complex XML metadata cathedrals it’s a lot harder to make the case that we should spend an enormous amount of time implementing a cumbersome spec in the hope that it will prove useful in the future.

Leave a Reply