Andy Powell has a post over on the eFoundations blog about some metadata guidelines he and Pete Johnston are working on for the UK Resource Discovery Taskforce. I got to rambling in a text area on his blog, but I guess I wrote too much, or included too many external URLs, so I couldn’t post it in the end. So I thought I’d just post it here, and let trackback do the rest.

So uh, please s/you/Andy/g in your head as you are reading this …

A bit of healthy skepticism, from a 15-year vantage point is definitely warranted. Bearing in mind that often times its hard to move things forward without taking a few risks. I imagine constrained fiscal resources could also be a catalyst to improving access to data flows that cultural heritage institutions participate in, or want to participate in. I wonder if it would be useful to factor in the money that organizations can save by working together better?

As I’ve heard you argue persuasively in the past, the success of the WWW as a platform for delivery of information is hard to argue with. One of the things that the WWW did right (from the beginning) was focus the technology on people actually doing stuff…in their browsers. It seems really important to make sure whatever this metadata is, that users of the Web will see it (somehow) and will be able to use it. Ian Davis’ points in Is the Semantic Web Destined to be a Shadow are still very relevant today I think.

My friend Dan Krech calls this an “alignment problem”. So I was really pleased to see this in the vision document:

Agreed core standards for metadata for
the physical objects and digital objects in
aggregations ensuring the aggregations
are open to all major search engines

Aligning with the web is a good goal to have. Relatively recent service offerings from Google and Facebook indicate their increased awareness of the utility of metadata to their users. And publishers are recognizing how important they are for getting their stuff before more eyes. It’s a kind of virtuous cycle I hope.

This must feel like it has been a long time in coming for you and Pete. Google’s approach encourages a few different mechanisms: RDFa, Microdata and Microformats. Similarly, Google Scholar parses a handful of metadata vocabularies present in the HTML head element. The web is a big place to align with I guess.

I imagine there will be hurdles to get over, but I wonder if your task-force could tap into this virtuous cycle. For example it would be great if cultural heritage data could be aggregated using techniques that big Search companies also use: e.g. RDFa, microformats and microdata; and sitemaps and Atom for updates. This would assume a couple things: publishers could allow (and support) crawling, and that it would be possible to build aggregator services to do the crawling. An important step would be releasing the aggregated content in an open way too. This seems to be an approach that is very similar to what I’ve heard Europeana is doing…which may be something else to align with.

I like the idea of your recommendations providing a sliding scale, for people to get their feet wet in providing some basic information, and then work their way up to the harder stuff. Staying focused on what sorts of services moving up the scale provides seems to be key. Part of the vision document mentions that the services are intended for staff. There is definitely a need for administrators to manage these systems (I often wonder what sort of white-listing functionality Google employs with its Rich Snippets service to avoid spam). But keeping the ultimate users of this information in mind is really important.

Finally I’m a bit curious about the use of ‘aggregations’ in the RLUK vision. Is that some OAI-ORE terminology percolating through?