In my previous blog post I was trying to demonstrate the virtues of making the descriptions of their datasets available as RDFa. Just this morning I learned from Mark Birbeck that the folks down under at did this last October!

For example this page describing a dataset for public Internet locations has this RDF metadata inside it:

<> cc:attributionName ""@en-au ;
     cc:attributionURL <> ;
     dc:coverage.geospatial "Australia"@en-au ;
     dc:coverage.temporal "Not specified"@en-au ;
     dc:creator "Centrelink"@en-au ;
     dc:date.modified "2009-08-31"^^xsd:date ;
     dc:date.published "2009-08-31"^^xsd:date ;
     dc:description """<p xml:lang="en-au" xmlns="">Location of Centrelink Offices</p>
"""^^rdf:XMLLiteral ;
     dc:identifier "80"@en-au ;
     dc:keywords "<a href=\"\" rel=\"tag\" xml:lang=\"en-au\" xmlns=\"\">Social Security</a>"^^rdf:XMLLiteral ;
     dc:license "<a href=\"\" rel=\"licence\" xml:lang=\"en-au\" xmlns=\"\"><img alt=\"Creative Commons License\" class=\"licence\" src=\"\"/>Creative Commons - Attribution 2.5 Australia (CC-BY)</a>"^^rdf:XMLLiteral ;
     dc:source "<a href=\"\" rel=\"dc:source\" xml:lang=\"en-au\" xmlns=\"\"/>"^^rdf:XMLLiteral ;
     dc:subject "<a href=\"\" rel=\"category tag\" title=\"View all posts in Community\" xml:lang=\"en-au\" xmlns=\"\">Community</a>,  <a href=\"\" rel=\"category tag\" title=\"View all posts in Employment\" xml:lang=\"en-au\" xmlns=\"\">Employment</a>,  <a href=\"\" rel=\"category tag\" title=\"View all posts in Government\" xml:lang=\"en-au\" xmlns=\"\">Government</a>"^^rdf:XMLLiteral ;
     dc:title "Location of Centrelink Offices"@en-au ;
     dc:type <> ;
     agls:jurisdiction "[Commonwealth of] Australia (AU)"@en-au ;

<> dc:format "CSV"@en-au . 

Now this data isn’t without problems: notice the XML literals as objects in the assertions involving subject, keyword, license and source? But it’s a Beta after all, and lots of us are learning this as we go, so Australia deserves a ton of credit. One really nice thing they are doing is making assertions about the format and URL location of the dataset itself. It would be even better if the dataset description was linked up with the dataset files using oai-ore or some other vocabulary.

In about 5 minutes I adapted the simplistic crawler to crawl the data. There aren’t as many datasets, so the crawler only pulled down 1725 triples (minus the xhtml triples)…but perhaps I missed some in my simplistic crawl.

Seeing both the and efforts to make dataset descriptions available makes me wonder if it could be useful for the W3C eGov Working Group to provide some lightweight guidance on how to make dataset descriptions available: what sorts of vocabularies to use, the kinds of assertions that are important, etc. It’s hard not to daydream of trying to provide an aggregated view of both pools of data, which is kept in synch using the web, and which perhaps could pull down aggregated datasets and archive them, etc. Perhaps a little spot checking tool that took at look at your HTML and let you know if it can work as a dataset description would be useful too?