resource maps and site maps
Andy reminds me that a relatively simple idea (I think it was David’s at RepoCamp) for the OAI-ORE Challenge would be to create a tool that transformed OAI-ORE resource maps expressed as Atom into Google Site Maps. This would allow “repositories” that exposed their “objects” as resource maps, to easily be crawled by Google and others.
It would also be useful to demonstrate what value-add OAI-ORE resource maps give you: to answer the question of why not just generate the site map and be done with it. I think there definitely are advantages, such as being able to identify compound objects or aggregations of web resources, and then make assertions about them (a.k.a. attach metadata to them).
Tags: atom, google, harvesting, http, oai-ore
August 1st, 2008 at 9:42 am
Google, Yahoo, and Microsoft already support Atom as a sitemap format. Does OAI-ORE bring anything else to the table (in the way of more specific sitemapping instructions) that would warrant another serialization?
August 1st, 2008 at 6:16 pm
GoogleSiteMaps support things like changefreq and priority that don’t have an analog in the Atom world. But the main problem with repositories is discovery, so perhaps simply making OAI-ORE resource maps available as Atom will be enough eh? Thanks for the comment.
August 4th, 2008 at 2:30 pm
We considered sitemaps as a serialization format early on, but their limitation of
location limits their usefulness. For example, /a/b/sitemap.xml could aggregate
/a/b/foo.html, but not /a/c/bar.html or /x/y/z.html. For some repos this might
not be a problem, but it would prevent things like arXiv.org aggregating resources
in citebase.org.
(see http://www.sitemaps.org/protocol.php#location).