<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>inkdroid &#187; publishing</title>
	<atom:link href="http://inkdroid.org/journal/category/publishing/feed/" rel="self" type="application/rss+xml" />
	<link>http://inkdroid.org/journal</link>
	<description>$pithy_personal_mission_statement</description>
	<lastBuildDate>Wed, 28 Jul 2010 13:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>federal register embraces the web and opensource</title>
		<link>http://inkdroid.org/journal/2010/07/27/federal-register-embraces-the-web-and-opensource/</link>
		<comments>http://inkdroid.org/journal/2010/07/27/federal-register-embraces-the-web-and-opensource/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 15:05:52 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[government]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[egov]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[gpo]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[nara]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=2174</guid>
		<description><![CDATA[Tom Lee of the Sunlight Foundation blogged yesterday about the new Federal Register website. The facelift was also announced a few days earlier by the Archivist of the United States, David Ferriero. If you aren&#8217;t familiar with it already, the Federal Register is basically the daily newspaper of the United States Federal Government, which details [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://federalregister.gov"><img style="border: none; margin-right: 10px; float: left;" src="http://inkdroid.org/images/federal-register.png"/></a><a href="http://sunlightfoundation.com/people/tlee/">Tom Lee</a> of the <a href="http://sunlightfoundation.com/">Sunlight Foundation</a> <a href="http://sunlightlabs.com/blog/2010/meet-the-new-federal-register/">blogged yesterday</a> about the <a href="http://www.federalregister.gov/">new Federal Register website</a>.  The facelift was also <a href="http://blogs.archives.gov/aotus/?p=1317">announced</a> a few days earlier by the Archivist of the United States, <a href="http://en.wikipedia.org/wiki/David_Ferriero">David Ferriero</a>. If you aren&#8217;t familiar with it already, the Federal Register is basically <em>the</em> daily newspaper of the United States Federal Government, which details all the rules and regulations of the federal agencies. It is compiled by the <a href="http://www.archives.gov/federal-register/">Office of the Federal Register</a> located in the <a href="http://www.archives.gov/">National Archives</a>, and printed by the <a href="http://www.gpo.gov/">Government Printing Office</a>. As the <a href="http://www.youtube.com/watch?v=ADhP0KSmjkQ">video</a> describing the new site points out, the Federal Register began publication in 1936 in the depths of the <a href="http://en.wikipedia.org/wiki/Great_Depression_in_the_United_States">Great Depression</a> as a way to communicate <em>in one place</em> all that the agencies were <a href="http://en.wikipedia.org/wiki/New_Deal">doing</a> to try to jump start the economy. So it seems like a fitting time to be rethinking the role of the Federal Register.</p>
<p>I&#8217;m no usability expert, but just a few minutes browsing the <a href="http://www.federalregister.gov/">new site</a> and comparing it to the <a href="http://www.gpoaccess.gov/fr/">old one</a> make it clear what a leap forward this is. Hopefully the <a href="http://www.federalregister.gov/policy/legal_status">legal status</a> of the new site will be ironed out shortly. </p>
<p>Most of all it&#8217;s great to see that the Federal Register is now a single web application. The service it provides to the American public is important enough to deserve its own dedicated web presence. As the developers point out in <a href="http://www.youtube.com/watch?v=13fLdUyrd7A">their video</a> describing the effort, they wanted to make the Federal Register a &#8220;first class citizen of the web&#8221;&#8230;and I think they are certainly helping do that. This might seem obvious, but often there is a temptation to jam publications from the print world (like the Federal Register) into dumbed down <a href="http://www.gpo.gov/fdsys/">monolithic repositories</a> that treat all &#8220;objects&#8221; the same. Proponents of this approach tend to characterize one off websites like Federal Register 2.0 as &#8220;yet another silo&#8221;. But I think it&#8217;s important to remember that <a href="http://www.youtube.com/watch?v=OM6XIICm_qo#t=0m37s">the web was really created to break down the silo walls</a>, and that every well designed web site is actually the antithesis of a silo. In fact, monolithic repository systems that treat all publications as static documents to be uniformly managed are more like silos than these &#8216;one off&#8217; dedicated web applications.</p>
<p>As a software developer working in the federal government there were a few things about the Federal Register 2.0 that I found really exciting:</p>
<ul>
<li>Fruitful collaboration between federal employees and citizen activist/geeks initiated by a <a href="http://sunlightlabs.com/contests/appsforamerica2/">software development contest</a>.</li>
<li>Extensive use of opensource technologies like <a href="http://www.ruby-lang.org/en/">Ruby</a>, <a href="http://rubyonrails.org/">Ruby on Rails</a>, <a href="http://www.mysql.com/">MySQL</a>, <a href="http://www.sphinxsearch.com/">Sphinx</a>, <a href="http://nginx.org/">nginx</a>, <a href="http://varnish-cache.org/">Varnish</a>, <a href="http://www.modrails.com/">Passenger</a>, <a href="http://httpd.apache.org/">Apache2</a>, <a href="http://www.ubuntu.com/">Ubuntu Linux</a>, <a href="http://wiki.opscode.com/display/chef/Home">Chef</a>. Opensource technologies encourage collaboration by allowing citizen activists/technologists to participate without having to drop a princely sum.</li>
<li>Release of the <a href="http://github.com/criticaljuncture/fr2/">source code for the website itself</a>, using decentralized revision control (<a href="http://git-scm.com/">git</a>) so that people can easily contribute changes, and see how the site was put together.</li>
<li>Extensive use of syndicated feeds to communicate how how content is being added to the site, <a href="http://www.federalregister.gov/events/search?conditions[term]=&#038;conditions[location]=20901&#038;conditions[within]=25&#038;commit=Go">ical</a> feeds to keep on top of events going on in your area, and <a href="http://www.federalregister.gov/articles/xml/201/018/147.xml">detailed XML for each entry</a>.</li>
<li>The <a href="http://federalregister.gov/robots.txt">robots.txt file for the site</a> makes the content fully crawlable by web indexers, except for search related portions of the website. Excluding dynamic search results is often important for performance reasons, but much of the article content can be discovered via links, see below about permalinks. They also have made a <a href="http://sitemaps.org">sitemap</a> available for crawlers to efficiently discover URLs for the content.</li>
<li>Deployment of the web application to the cloud using Amazon&#8217;s <a href="http://aws.amazon.com/ec2/">EC2</a> and <a href="http://aws.amazon.com/s3/">S3</a> services. Cloud computing allows computing resources to scale to meet demand. In effect this means that government IT shops don&#8217;t have to make big up front investments in infrastructure to make new services available. I guess the jury is still out, but I think this will eventually prove to greatly lower the barrier to innovation in the egov sector. It also lets the more progressive developers in government leap frog ancient technologies and bureaucracies to get things done in a timely manner.</li>
<li>And last, but certainly not least &#8230; <strong>now every entry in the Federal Register has a URL!</strong>. Permalinks for the Federal Register are incredibly important for citability reasons. I predict that we&#8217;ll quickly see more and more people referencing specific parts of the Federal Register in social media sites like Facebook, Twitter and out on the open web in blogs, and in collaborative applications like Wikipedia.</li>
</ul>
<p>I would like to see more bulk access to XML data made available, for re-purposing on other websites&#8211;although I guess it might be able to walk from the syndicated feeds to the detailed XML. Also, the search functionality is so rich it would be useful to have an <a href="http://en.wikipedia.org/wiki/OpenSearch">OpenSearch</a> description that documents it, and perhaps provides some hooks for getting back JSON and/or XML representations. Perhaps even following the lead of the <a href="http://www.london-gazette.co.uk/">London Gazette</a> and trying to make some of the structured metadata available in the the HTML using RDFa. It also looks like content is only available for 2008 on, so it might be interesting to see how easy it would be to make more of the historic content available. </p>
<p>But the great thing about what these folks have done is now I can fork the project on github, see how easy it is to add the changes, and let the developers know about my updates to see if they are worth merging back into the production website. This is an incredible leap forward for egov efforts&#8211;so hats off to everyone who helped make this happen.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2010/07/27/federal-register-embraces-the-web-and-opensource/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>a middle way for linked data at the bbc</title>
		<link>http://inkdroid.org/journal/2010/03/02/a-middle-way-for-linked-data-at-the-bbc/</link>
		<comments>http://inkdroid.org/journal/2010/03/02/a-middle-way-for-linked-data-at-the-bbc/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 20:13:00 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[publishing]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[bbc]]></category>
		<category><![CDATA[dbpedia]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[radio]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[television]]></category>
		<category><![CDATA[uberblic]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=1701</guid>
		<description><![CDATA[I got the chance to attend the 2nd London Linked Data Meetup that was co-located with dev8d last week, which turned out to be a whole lot of fun. I figured if I waited long enough other people would save me from having to write a good summary/discussion of the event&#8230;and they have: thanks Pete [...]]]></description>
			<content:encoded><![CDATA[<p>I got the chance to attend the <a href="http://www.meetup.com/Web-Of-Data/calendar/12317420/">2nd London Linked Data Meetup</a> that was co-located with  <a href="http://dev8d.com">dev8d</a> last week, which turned out to be a whole lot of fun. I figured if I waited long enough other people would save me from having to write a good summary/discussion of the event&#8230;and they have: thanks <a href="http://efoundations.typepad.com/efoundations/2010/02/the-2nd-linked-data-london-meetup-trying-to-bridge-a-gap.html">Pete Johnston</a>, <a href="http://bens.me.uk/2010/london-linked-data-meetup">Ben Summers</a>, <a href="http://blogs.cetis.ac.uk/sheilamacneill/2010/02/26/2nd-linked-data-meetup-london/">Sheila Macneill</a>, <a href="http://www.currybet.net/cbet_blog/2010/03/linked_data_human_readable_uris.php">Martin Belam</a> and <a href="http://www.frankieroberto.com/weblog/1621">Frankie Roberto</a>. </p>
<p><img src="http://inkdroid.org/images/bbc.png" style="margin-right: 10px; margin-bottom: 5px; float: left;"/> The main thing that I took away is how much good work the <a href="http://bbc.com">BBC</a> is doing in this space. Given the recent news of <a href="http://www.nytimes.com/2010/03/03/business/media/03bbc.html">cuts</a> at the BBC, it seems like a good time to say publicly how important some of the work they are doing is to the web technology sector. As part of the Meetup <a href="http://derivadow.com/">Tom Scott</a> gave a  <a href="http://www.slideshare.net/derivadow/apis-and-apis-a-wildlife-ontology">presentation</a> on how the BBC are using Linked Data to integrate distinct web properties in the BBC enterprise, like their <a href="http://www.bbc.co.uk/programmes">Programmes</a> and the <a href="http://www.bbc.co.uk/wildlifefinder/">Wildlife Finder</a> web sites. </p>
<p>The basic idea is that they categorize (dare I say catalog?) <a href="http://www.bbc.co.uk/programmes">television and radio content</a> using wikipedia/dbpedia as a <a href="http://en.wikipedia.org/wiki/Controlled_vocabulary">controlled vocabulary</a>. Just doing this relatively simple thing means that they can create another site like the <a href="http://www.bbc.co.uk/wildlifefinder/">Wildlife Finder</a> that provides a topical guide to the natural world (and also happens to use wikipedia/dbpedia as a controlled vocabulary), that then links to their audio and video content. Since the two sites share a common topic vocabulary, they are able to automatically create links from the topic guides to all the radio and television content that are on a particular topic. </p>
<p>For a practical example take a look consider this page for the <a href="http://www.bbc.co.uk/nature/species/Pinus_longaeva">Great Basin Bristlecone Pine</a>:</p>
<p><a href="http://www.bbc.co.uk/nature/species/Pinus_longaeva"><img src="http://inkdroid.org/images/bristlecone.png" style="border: none; width: 80%;" /></a></p>
<p>If you scroll down on the page you&#8217;ll see a link to a <a href="http://www.bbc.co.uk/programmes/p005fs5p">video clip</a> from David Attenborough&#8217;s documentary <a href="http://www.bbc.co.uk/programmes/b00lbpcy">Life</a> on the Programmes portion of the website. Now take a step back and consider how these are two separate applications in the BBC enterprise that are able to build a rich network of links between each other. It&#8217;s the shared controlled vocabulary (in this case dbpedia derived from wikipedia) which allows them to do this.</p>
<p>If you take a peak in the html you&#8217;ll see the resource has an alternate RDF version:</p>
<pre>
&lt;link rel="alternate" type="application/rdf+xml" href="<a href="http://www.bbc.co.uk/nature/species/Pinus_longaeva.rdf">/nature/species/Pinus_longaeva.rdf</a>" /&gt;
</pre>
<p>The Resource Description Framework (RDF) is really just the best data model we have for describing stuff that&#8217;s on the Web, and the type of links between resources that are on (and off) the Web. Personally, I prefer to look at RDF as <a href="http://www.w3.org/TeamSubmission/turtle/">Turtle</a> which is pretty easily done with <a href="http://www.dajobe.org/">Dave Beckett</a>&#8216;s handy <a href="http://librdf.org/raptor/rapper.html">rapper</a> utility (`aptitude install raptor-utils` if you are following from home).</p>
<pre>
rapper -o turtle http://www.bbc.co.uk/nature/species/Pinus_longaeva
</pre>
<p>The key bits of the RDF are the description of the Great Basin bristlecone pine:</p>
<pre>
&lt;http://www.bbc.co.uk/nature/species/Pinus_longaeva&gt;
    rdfs:seeAlso &lt;http://www.bbc.co.uk/nature/species&gt; ;
    foaf:primaryTopic &lt;http://www.bbc.co.uk/nature/species/Pinus_longaeva#species&gt; .

&lt;http://www.bbc.co.uk/nature/species/Pinus_longaeva#species&gt;
    dc:description "Great Basin bristlecone pines are restricted to the mountain ranges of California, Nevada and Utah and have a remarkable ability to survive in this extremely harsh and challenging environment. They grow extremely slowly, and are some of the oldest living organisms in the world. With some aged at almost 5,000 years these amazing trees can reveal information about Earth's climate variations. Amazingly, the leaves, or needles, can remain green for over 45 years." ;
    wo:class &lt;http://www.bbc.co.uk/nature/class/Pinopsida#class&gt; ;
    wo:family &lt;http://www.bbc.co.uk/nature/family/Pinaceae#family&gt; ;
    wo:genus &lt;http://www.bbc.co.uk/nature/genus/Pinus#genus&gt; ;
    wo:growsIn &lt;http://www.bbc.co.uk/nature/habitats/Mountain#habitat&gt;, &lt;http://www.bbc.co.uk/nature/habitats/Temperate_coniferous_forest#habitat&gt; ;
    wo:kingdom &lt;http://www.bbc.co.uk/nature/kingdom/Plant#kingdom&gt; ;
    wo:name &lt;http://www.bbc.co.uk/nature/species/Pinus_longaeva#name&gt; ;
    wo:order &lt;http://www.bbc.co.uk/nature/order/Pinales#order&gt; ;
    wo:phylum &lt;http://www.bbc.co.uk/nature/phylum/Pinophyta#phylum&gt; ;
    a wo:Species ;
    rdfs:label "Great Basin bristlecone pine" ;
    <span style="color: red">owl:sameAs &lt;http://dbpedia.org/resource/Pinus_longaeva&gt; ;</span>
    foaf:depiction &lt;http://open.live.bbc.co.uk/dynamic_images/naturelibrary_640_credits/downloads.bbc.co.uk/earth/naturelibrary/assets/p/pi/pinus_longaeva/pinus_longaeva_1.jpg&gt; .
</pre>
<p>And then the description of the clip that is related to the topic of Great Basin bristlecone pine:</p>
<pre>
&lt;http://www.bbc.co.uk/programmes/p005fs5p#programme&gt;
    dc:title "Ancient bristlecones" ;
    po:subject &lt;http://www.bbc.co.uk/nature/species/Pinus_longaeva#species&gt; ;
    a po:Clip .
</pre>
<p>And we can follow our nose and fetch a description of the  <a href="http://www.bbc.co.uk/programmes/p005fs5p">Ancient bristelcones clip</a>:</p>
<pre>
rapper -o turtle http://www.bbc.co.uk/programmes/p005fs5p
</pre>
<p>Which tells us lots of stuff, like that it&#8217;s a documentary part of the science and nature genre, gives us a synopsis, and even links the clip to the episode and series it is a part of:</p>
<pre>
&lt;http://www.bbc.co.uk/programmes/p005fs5p#programme&gt;
    dc:title "Ancient bristlecones" ;
    po:format &lt;http://www.bbc.co.uk/programmes/formats/documentaries#format&gt; ;
    po:genre &lt;http://www.bbc.co.uk/programmes/genres/factual/scienceandnature#genre&gt;, &lt;http://www.bbc.co.uk/programmes/genres/factual/scienceandnature/natureandenvironment#genre&gt; ;
    po:long_synopsis """Bristlecone pines live at the limit of life, above 3,000m in the mountains of  western America. Almost continuous freezing temperatures and savage winds make life so tough, that these bristlecones only grow for six weeks of the year.

Everything is about conserving energy.They hardly ever shed their needles which can last more than 30 years. After centuries of being blasted by storms a full grown tree still survives with only a strip of bark a few inches wide.

These trees live life at such a slow pace they can reach a great age. Some are over 5,000 years old. It has been said of the bristlecones that to live here is to take a very long time to die.""" ;
    po:medium_synopsis "Living above 3,000 metres, North America's bristlecones cope with freezing temperatures and battering winds by only growing for six weeks of the year. But seeing as they may live for more than 5,000 years, that's still a fair bit of growing in a single lifetime. Slowly but surely does it..." ;
    po:short_synopsis "The world's oldest trees have survived 5,000 years of harsh conditions." ;
    po:version &lt;http://www.bbc.co.uk/programmes/p005fs5r#programme&gt; ;
    a po:Clip .

&lt;http://www.bbc.co.uk/programmes/b00lbpcy#programme&gt;
    po:clip &lt;http://www.bbc.co.uk/programmes/p005fs5p#programme&gt; ;
    a po:Series .

&lt;http://www.bbc.co.uk/programmes/b00p90d6#programme&gt;
    po:clip &lt;http://www.bbc.co.uk/programmes/p005fs5p#programme&gt; ;
    a po:Episode .
</pre>
<p>Conspicuously missing from this description is something like:</p>
<pre>
&lt;http://www.bbc.co.uk/programmes/p005fs5p#programme&gt;
    dcterms:subject &lt;http://dbpedia.org/resource/Pinus_longaeva&gt; .
</pre>
<p>But presumably it&#8217;s hiding underneath the covers in the Programmes database, and that&#8217;s what lets them link stuff up?</p>
<p><img src="http://uberblic.com/images/logo.png" style="float: right; margin-left: 10px" /> Also very interesting was <a href="http://blog.georgikobilarov.com/">Georgi Kobilarov</a>&#8216;s description of <a href="http://uberblic.org/2010/01/uberblic-release/">Uberblic</a>.  Since Georgi helped create <a href="http://dbpedia.org">dbpedia</a> and is now consulting with the BBC, it seems like uberblic is positioning itself to provide a platform for the BBC to have it&#8217;s own local cache of the world of Linked Data. Having a local curated view of the world of linked data is something <a href="http://onebiglibrary.net">Dan Chudnov</a> identified as a real need at the <a href="http://wiki.code4lib.org/index.php/LinkedData">first Linked Data workshop at code4lib 2009</a> for <a href="http://onebiglibrary.net/story/code4lib-2009-talk-on-caching-and-proxying-linked-data">caching and proxying linked data</a>&#8230;so it is really cool to see solutions starting to appear in this space&#8230;and for them to be adopted by institutions like the BBC. </p>
<p>Georgi demo&#8217;d how an edit on wikipedia would be immediately reflected in the structured data available from uberblic. It was a real time update, and extremely impressive. It <a href="http://twitter.com/gkob/status/9734074624">looks like</a> part of the uberblic strategy is to crawl BBC&#8217;s web site and other pockets of Linked Data to enable the sort of linking across web properties that Tom described. I&#8217;d also surmise given the realtime nature of this that Georgi is bypassing dbpedia dumps and using the Wikipedia <a href="http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&#038;feed=atom">changes atom feed</a> in conjunction with <a href="http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/extraction/extractors/">extractors</a> that were built as part of the dbpedia project. But I&#8217;d love to know more of the mechanics of the update. It also would be interesting to know if uberblic has a notion of versions. </p>
<p>The really powerful message that the BBC is helping promote is this idea that good websites are APIs. Tom mentioned <a href="http://blog.whatfettle.com/">Paul Downey&#8217;s</a> notion that <a href="http://blog.whatfettle.com/2007/01/11/good-web-apis-are-just-web-sites/">Web APIs Are Just Web Sites</a>. It&#8217;s a subtle but extremely important point that I learned primarily working closely with <a href="http://eikeon.com">Dan Krech</a> for a year or so. It&#8217;s an unfortunate side effect of lots market driven talk about web2.0, web3.0 and Linked Data in general that this simple REST message gets lost.  We took it seriously in the design of the <a href="http://chroniclingamerica.loc.gov/about/api/">&#8220;API&#8221;</a> at the Library of Congress&#8217; <a href="http://chroniclingamerica.loc.gov/">Chronicling America</a>. It&#8217;s also something I tried to talk about later in the week at dev8d when I had to quickly put a presentation together:</p>
<p><iframe src="http://docs.google.com/present/embed?id=dv89m3d_374cpqzfnc9" frameborder="0" width="410" height="342"></iframe></p>
<p>The slides probably won&#8217;t make much sense on their own, but the basic message was that we often hear about Linked Data in terms of pushing all your data to some triple store so you can start querying it with <a href="http://code.google.com/p/linked-data-api/wiki/Specification">SPARQL</a> and doing inferencing, and suddenly you&#8217;re going to be sitting pretty, totally jacked up on the Semantic Web. </p>
<p>If you are like me, you&#8217;ve already got databases where things are modeled, and you&#8217;ve created little web apps that have extracted information from the databases and put them on the web as HTML docs for people around the world to read (queue some mid 1990s grunge music). Expecting people to chuck away the applications and technology stacks they have simply to say they do Linked Data is wishful thinking. What&#8217;s missing is a simple migration strategy that would allow web publishers to easily recognize the value in publishing the contents of their database as Linked Data, and how it complements the HTML (and XML, JSON) publishing they are currently doing. My advice to folks at dev8d boiled down to:</p>
<ul>
<li>Keep modelling your stuff how you like</li>
<li>Identify your stuff with Cool URIs in your webapps</li>
<li>Link your stuff together in HTML</li>
<li>Link to machine friendly formats (RSS, Atom, JSON, etc)</li>
<li>Use RDF to make your database available on the web using vocabularies other people understand.</li>
<li>Start thinking about technologies like SPARQL that will let you query pools and aggregated views of your data.</li>
<li>Consider joining the <a href="http://lists.w3.org/Archives/Public/public-lod/">public-lod discussion list and joining the conversation.</a></li>
</ul>
<p>I got some nice comments afterwards from <a href="http://www.ninebynine.org/">Graham Klyne</a>,  <a href="http://users.ecs.soton.ac.uk/hg/">Hugh Glaser</a>, <a href="http://blogs.ukoln.ac.uk/adrianstevenson/">Adrian Stevenson</a> and <a href="http://vphill.com/">Mark Phillips</a> so I felt pretty happy&#8230;granted most of the hard line Linked Data folks had already left a couple days earlier. </p>
<p>So some really exciting stuff is going on at the BBC. They are using Linked Data in a practical way that benefits their enterprise in real ways. I&#8217;m crossing my fingers and hoping that the value of what is going on here is recognized, and the various cuts that are going on won&#8217;t affect any of the fine work they are doing on improving the Web. </p>
<p>For more information check out the <a href="http://www.w3.org/2001/sw/sweo/public/UseCases/BBC/">Semantic Web Case Study</a> they folks at the BBC wrote summarizing their approach for the W3C.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2010/03/02/a-middle-way-for-linked-data-at-the-bbc/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>data.australia.gov.au and rdfa</title>
		<link>http://inkdroid.org/journal/2010/01/29/data-australia-gov-au-and-rdfa/</link>
		<comments>http://inkdroid.org/journal/2010/01/29/data-australia-gov-au-and-rdfa/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 14:36:57 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[publishing]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[australia]]></category>
		<category><![CDATA[egov]]></category>
		<category><![CDATA[rdfa]]></category>
		<category><![CDATA[w3c]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=1656</guid>
		<description><![CDATA[In my previous blog post I was trying to demonstrate the virtues of data.gov.uk making the descriptions of their datasets available as RDFa. Just this morning I learned from Mark Birbeck that the folks down under at data.australia.gov.au did this last October! For example this page describing a dataset for public Internet locations has this [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://inkdroid.org/journal/2010/01/26/data-gov-uk-and-rdfa/">previous blog post</a> I was trying to demonstrate the virtues of data.gov.uk making the descriptions of their datasets available as <a href="http://www.w3.org/TR/xhtml-rdfa-primer/">RDFa</a>. Just this morning I <a href="http://groups.google.com/group/uk-government-data-developers/msg/16599323836791e5">learned</a> from <a href="http://webbackplane.com/mark-birbeck">Mark Birbeck</a> that the folks down under at <a href="http://data.australia.gov.au">data.australia.gov.au</a> did this last October!</p>
<p>For example <a href="http://data.australia.gov.au/414">this</a> page describing a dataset for public Internet locations has this RDF metadata inside it:</p>
<pre>
&lt;http://data.australia.gov.au/80&gt; cc:attributionName "http://www.centrelink.gov.au/"@en-au ;
     cc:attributionURL &lt;http://www.centrelink.gov.au/&gt; ;
     dc:coverage.geospatial "Australia"@en-au ;
     dc:coverage.temporal "Not specified"@en-au ;
     dc:creator "Centrelink"@en-au ;
     dc:date.modified "2009-08-31"^^xsd:date ;
     dc:date.published "2009-08-31"^^xsd:date ;
     dc:description """&lt;p xml:lang="en-au" xmlns="http://www.w3.org/1999/xhtml"&gt;Location of Centrelink Offices&lt;/p&gt;
"""^^rdf:XMLLiteral ;
     dc:identifier "80"@en-au ;
     dc:keywords "&lt;a href=\"http://data.australia.gov.au/tag/social-security\" rel=\"tag\" xml:lang=\"en-au\" xmlns=\"http://www.w3.org/1999/xhtml\"&gt;Social Security&lt;/a&gt;"^^rdf:XMLLiteral ;
     dc:license "&lt;a href=\"http://creativecommons.org/licenses/by/2.5/au/\" rel=\"licence\" xml:lang=\"en-au\" xmlns=\"http://www.w3.org/1999/xhtml\"&gt;&lt;img alt=\"Creative Commons License\" class=\"licence\" src=\"http://i.creativecommons.org/l/by/2.5/au/88x31.png\"/&gt;Creative Commons - Attribution 2.5 Australia (CC-BY)&lt;/a&gt;"^^rdf:XMLLiteral ;
     dc:source "&lt;a href=\"http://www.centrelink.gov.au/\" rel=\"dc:source\" xml:lang=\"en-au\" xmlns=\"http://www.w3.org/1999/xhtml\"/&gt;"^^rdf:XMLLiteral ;
     dc:subject "&lt;a href=\"http://data.australia.gov.au/catalogue/community\" rel=\"category tag\" title=\"View all posts in Community\" xml:lang=\"en-au\" xmlns=\"http://www.w3.org/1999/xhtml\"&gt;Community&lt;/a&gt;,  &lt;a href=\"http://data.australia.gov.au/catalogue/employment\" rel=\"category tag\" title=\"View all posts in Employment\" xml:lang=\"en-au\" xmlns=\"http://www.w3.org/1999/xhtml\"&gt;Employment&lt;/a&gt;,  &lt;a href=\"http://data.australia.gov.au/catalogue/government\" rel=\"category tag\" title=\"View all posts in Government\" xml:lang=\"en-au\" xmlns=\"http://www.w3.org/1999/xhtml\"&gt;Government&lt;/a&gt;"^^rdf:XMLLiteral ;
     dc:title "Location of Centrelink Offices"@en-au ;
     dc:type &lt;http://purl.org/dc/dcmitype/Text&gt; ;
     agls:jurisdiction "[Commonwealth of] Australia (AU)"@en-au ;

&lt;http://www1.australia.gov.au/datasets/Federal/Centrelink/Location%20of%20Centrelink%20offices%2031_08_09/centrelink_offices_31_08_2009.CSV&gt; dc:format "CSV"@en-au .
</pre>
<p>Now this data isn&#8217;t without problems: notice the XML literals as objects in the assertions involving subject, keyword, license and source? But it&#8217;s a Beta after all, and lots of us are learning this as we go, so Australia deserves a ton of credit. One really nice thing they are doing is making assertions about the format and URL location of the dataset itself. It would be even better if the dataset description was linked up with the dataset files using <a href="http://www.openarchives.org/ore/1.0/vocabulary">oai-ore</a> or some other vocabulary.</p>
<p>In about 5 minutes I adapted the simplistic data.gov.uk crawler to crawl the data.australia.gov.au data.  There aren&#8217;t as many datasets, so the <a href="http://inkdroid.org/bzr/data-australia-gov-au/crawl.py">crawler</a> only pulled down <a href="http://inkdroid.org/bzr/data-australia-gov-au/data.rdf">1725 triples</a> (minus the xhtml triples)&#8230;but perhaps I missed some in my simplistic crawl.</p>
<p>Seeing both the data.gov.uk and data.australia.gov.au efforts to make dataset descriptions available makes me wonder if it could be useful for the <a href="http://www.w3.org/2007/eGov/">W3C eGov Working Group</a> to provide some lightweight guidance on how to make dataset descriptions available: what sorts of vocabularies to use, the kinds of assertions that are important, etc. It&#8217;s hard not to daydream of trying to provide an aggregated view of both pools of data, which is kept in synch using the web, and which perhaps could pull down aggregated datasets and archive them, etc. Perhaps a little spot checking tool that took at look at your HTML and let you know if it can work as a dataset description would be useful too? </p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2010/01/29/data-australia-gov-au-and-rdfa/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>data.gov.uk and rdfa</title>
		<link>http://inkdroid.org/journal/2010/01/26/data-gov-uk-and-rdfa/</link>
		<comments>http://inkdroid.org/journal/2010/01/26/data-gov-uk-and-rdfa/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 22:41:19 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[publishing]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[egov]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[rdfa]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=1602</guid>
		<description><![CDATA[The recent public release of the UK Government&#8217;s data.gov.uk site got picked up by the press last week in articles at The Guardian, Prospect Magazine and elswhere. These have been supplemented by some more technical discussions at ReadWriteWeb, Open Knowledge Foundation, Talis, Jeni Tennison&#8217;s blog, and some helpful emails from Leigh Dodds (Talis) and Jonathan [...]]]></description>
			<content:encoded><![CDATA[<p>The recent public release of the UK Government&#8217;s <a href="http://data.gov.uk">data.gov.uk</a> site got picked up by the press last week in articles at <a href="http://news.bbc.co.uk/2/hi/technology/8470797.stm">The Guardian</a>, <a href="http://www.prospectmagazine.co.uk/2010/01/whitehalls-web-revolution-the-inside-story/">Prospect Magazine</a> and elswhere. These have been supplemented by some more technical discussions at  <a href="http://www.readwriteweb.com/archives/uk_launches_open_data_site_puts_datagov_to_shame.php">ReadWriteWeb</a>,  <a href="http://blog.okfn.org/2010/01/21/datagovuk-goes-public-and-its-using-ckan/">Open Knowledge Foundation</a>, <a href="http://blogs.talis.com/nodalities/2009/11/data-gov-uk-and-the-talis-platform.php">Talis</a>, <a href="http://www.jenitennison.com/blog/node/140">Jeni Tennison&#8217;s blog</a>, and some helpful emails from <a href="http://lists.w3.org/Archives/Public/public-egov-ig/2010Jan/0048.html">Leigh Dodds</a> (<a href="http://talis.com/platform">Talis</a>) and <a href="http://lists.w3.org/Archives/Public/public-egov-ig/2010Jan/0040.html">Jonathan Gray</a> (<a href="http://okfn.org">Open Knowledge Foundation</a>) on the <a href="http://lists.w3.org/Archives/Public/public-egov-ig/">w3c egovernment discussion list</a>.</p>
<p>One thing that I haven&#8217;t seen mentioned so far in public (which I just discovered today) is that data.gov.uk is using <a href="http://www.w3.org/TR/xhtml-rdfa-primer/">RDFa</a> to expose metadata about the datasets in a machine readable way. What this means is that in an HTML page for a dataset like <a href="http://data.gov.uk/dataset/agricultural_market_reports">this</a> there are some extra HTML attributes like <em>about</em>, <em>property</em>, <em>rel</em> that have been thoughtfully used to express some structured metadata about the dataset, which can be extracted from the HTML and expressed say as <a href="http://www.w3.org/TeamSubmission/turtle/">Turtle</a>:</p>
<pre>
&lt;http://data.gov.uk/id/dataset/agricultural_market_reports&gt; dct:coverage "Great Britain (England, Scotland, Wales)"@en ;
     dct:created "2009-12-04"@en ;
     dct:creator "Department for Environment, Food and Rural Affairs"@en ;
     dct:isReferencedBy &lt;http://data.gov.uk/wiki/index.php/Package:agricultural_market_reports&gt; ;
     dct:license "Crown Copyright"@en ;
     dct:source &lt;http://statistics.defra.gov.uk/esg/publications/amr/default.asp&gt;, &lt;https://statistics.defra.gov.uk/esg/publications/amr/default.asp&gt; ;
     dct:subject
         &lt;http://data.gov.uk/data/tag/agriculture&gt;,
         &lt;http://data.gov.uk/data/tag/agriculture-and-environment&gt;,
         &lt;http://data.gov.uk/data/tag/environment&gt;,
         &lt;http://data.gov.uk/data/tag/farm-business&gt;,
         &lt;http://data.gov.uk/data/tag/farm-businesses&gt;,
         &lt;http://data.gov.uk/data/tag/farming&gt; .
</pre>
<p>In fact since data.gov.uk has a nice paging mechanism that lists all the datasets it&#8217;s not hard to write a little <a href="http://inkdroid.org/bzr/data-gov-uk/crawl.py">script</a> that scrapes <a href="http://inkdrod.org/bzr/data-gov-uk/data.rdf">all the metadata for the datasets</a> (35,478 triples) right out of the web pages.</p>
<p>I also <a href="http://buytaert.net/data-gov-uk-using-drupal">noticed</a> via <a href="http://twitter.com/scorlosquet/status/8242145459">Stéphane Corlosquet</a> today that data.gov.uk is using the <a href="http://drupal.org">Drupal</a> open-source content management system. To what extent Drupal7&#8242;s new <a href="http://drupal.org/node/574624">RDFa features</a> are being used to layer in this RDFa isn&#8217;t clear to me. But it is an exciting development. It&#8217;s exciting because data.gov.uk is a great example of how to bubble up data that&#8217;s typically locked away in databases of some kind into the HTML that&#8217;s out on the web for people to interact with, and for crawlers to crawl and re-purpose.</p>
<p>For example I can now write a <a href="http://inkdroid.org/bzr/data-gov-uk/link_check.py">utility</a> to check the status of the external dataset links, to make sure they are they are there (200 OK). The <a href="http://inkdroid.org/bzr/data-gov-uk/link_check.txt">complete results by URL</a> can be summarized by rolling up by status code:</p>
<table style="border: thin gray solid; width: 60%;">
<tr>
<th>Status Code</th>
<th>Number of Datasets</th>
</tr>
<tr style='background: #eeeeee'>
<td>200</td>
<td>2977</td>
</tr>
<tr style='background: #ffffff'>
<td>404</td>
<td>106</td>
</tr>
<tr style='background: #eeeeee'>
<td>502</td>
<td>23</td>
</tr>
<tr style='background: #ffffff'>
<td>503</td>
<td>14</td>
</tr>
<tr style='background: #eeeeee'>
<td>[Errno socket error] [Errno -2] Name or service not known</td>
<td>8</td>
</tr>
<tr style='background: #ffffff'>
<td>500</td>
<td>3</td>
</tr>
<tr style='background: #eeeeee'>
<td>nonnumeric port: &#8221;</td>
<td>1</td>
</tr>
<tr style='background: #ffffff'>
<td>[Errno socket error] [Errno 110] Connection timed out</td>
<td>1</td>
</tr>
<tr style='background: #eeeeee'>
<td>400</td>
<td>1</td>
</tr>
</table>
<p>Or I can <a href="http://inkdroid.org/bzr/data-gov-uk/subjects.py">generate</a> a <a href="http://inkdroid.org/bzr/data-gov-uk/subjects.txt">list of dataset subjects</a> (eventhough it&#8217;s already <a href="http://data.gov.uk/data/tag">available</a> I guess). Here&#8217;s the top 25:</p>
<table style="border: thin gray solid; width: 60%;">
<tr>
<th>Subject</th>
<th>Number of Datasets</th>
</tr>
<tr style='background: #eeeeee'>
<td>health </td>
<td>645</td>
</tr>
<tr style='background: #ffffff'>
<td>care </td>
<td>427</td>
</tr>
<tr style='background: #eeeeee'>
<td>child </td>
<td>398</td>
</tr>
<tr style='background: #ffffff'>
<td>population </td>
<td>341</td>
</tr>
<tr style='background: #eeeeee'>
<td>children </td>
<td>295</td>
</tr>
<tr style='background: #ffffff'>
<td>school </td>
<td>273</td>
</tr>
<tr style='background: #eeeeee'>
<td>health-and-social-care </td>
<td>271</td>
</tr>
<tr style='background: #ffffff'>
<td>health-well-being-and-care </td>
<td>205</td>
</tr>
<tr style='background: #eeeeee'>
<td>economy </td>
<td>202</td>
</tr>
<tr style='background: #ffffff'>
<td>economics-and-finance </td>
<td>189</td>
</tr>
<tr style='background: #eeeeee'>
<td>census </td>
<td>188</td>
</tr>
<tr style='background: #ffffff'>
<td>education </td>
<td>176</td>
</tr>
<tr style='background: #eeeeee'>
<td>communities </td>
<td>154</td>
</tr>
<tr style='background: #ffffff'>
<td>benefit </td>
<td>153</td>
</tr>
<tr style='background: #eeeeee'>
<td>road </td>
<td>144</td>
</tr>
<tr style='background: #ffffff'>
<td>children-education-and-skills </td>
<td>121</td>
</tr>
<tr style='background: #eeeeee'>
<td>people-and-places </td>
<td>111</td>
</tr>
<tr style='background: #ffffff'>
<td>government-receipts-and-expenditure </td>
<td>110</td>
</tr>
<tr style='background: #eeeeee'>
<td>education-and-skills </td>
<td>110</td>
</tr>
<tr style='background: #ffffff'>
<td>housing </td>
<td>108</td>
</tr>
<tr style='background: #eeeeee'>
<td>environment </td>
<td>107</td>
</tr>
<tr style='background: #ffffff'>
<td>tax </td>
<td>107</td>
</tr>
<tr style='background: #eeeeee'>
<td>life-in-the-community </td>
<td>106</td>
</tr>
<tr style='background: #ffffff'>
<td>employment </td>
<td>105</td>
</tr>
<tr style='background: #eeeeee'>
<td>tax-credit </td>
<td>96</td>
</tr>
</table>
<p>I realize it&#8217;s early days but here are a few things it would be fun to see at data.gov.uk:</p>
<ul>
<li>add some RDFa and <a href="http://www.w3.org/TR/2009/REC-skos-reference-20090818/">SKOS</a> or <a href="http://www.commontag.org/Home">CommonTag</a> in tag pages like <a href="http://data.gov.uk/data/tag/education">education</a>: this would allow things to be hooked up a bit more explicitly, tags to be given nice labels, and encourage the reuse of the tagging vocabulary within and outside data.gov.uk</li>
<li>link the dataset descriptions to the dataset resources themselves (the pdfs, excel spreadsheets, etc) that are online using a vocabulary like the <a href="http://www.openarchives.org/ore/1.0/vocabulary">Open Archives Reuse and Exchange</a> and/or <a href="http://www.w3.org/TR/powder-dr/">POWDER</a>. This would allow for the harvesting and aggregation not only of the metadata, but the datasets as well.</li>
</ul>
<p>I imagine much of this sort of hacking around can be enabled by querying the <a href="http://data.gov.uk/sparql">data.gov.uk SPARQL endpoint</a>. But it hasn&#8217;t been very clear to me exactly what data is behind there. And there is something comforting about being able to crawl the open web to find the information that&#8217;s there in <a href="http://inkdroid.org/journal/2009/08/13/open-to-view/">open to view</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2010/01/26/data-gov-uk-and-rdfa/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Hacking O&#8217;Reilly RDFa</title>
		<link>http://inkdroid.org/journal/2009/12/22/hacking-oreilly-rdfa/</link>
		<comments>http://inkdroid.org/journal/2009/12/22/hacking-oreilly-rdfa/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 18:17:03 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[identifiers]]></category>
		<category><![CDATA[linkeddata]]></category>
		<category><![CDATA[oreilly]]></category>
		<category><![CDATA[rdfa]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=1535</guid>
		<description><![CDATA[I recently learned from Ivan Herman&#8216;s blog that O&#8217;Reilly has begun publishing RDFa in their online catalog of books. So if you go and install the RDFa Highlight bookmarklet and then visit a page like this and click on the bookmarklet you&#8217;ll see something like: Those red boxes you see are graphical depictions of where [...]]]></description>
			<content:encoded><![CDATA[<p>I recently learned from <a href="http://ivan-herman.name/2009/12/12/rdfa-usage-spreading…/">Ivan Herman</a>&#8216;s blog that O&#8217;Reilly has begun publishing RDFa in their online catalog of books. So if you go and install the <a href="http://www.w3.org/2001/sw/BestPractices/HTML/rdfa-bookmarklet/">RDFa Highlight</a> bookmarklet and then visit a page like <a href="http://oreilly.com/catalog/9780596516499/">this</a> and click on the bookmarklet you&#8217;ll see something like:</p>
<p><a href="http://oreilly.com/catalog/9780596516499/"><br />
<img src="http://inkdroid.org/images/oreilly-rdfa.png" style="width: 600px;"/><br />
</a></p>
<p>Those red boxes you see are graphical depictions of where metadata can be found interleaved in the HTML. In my screenshot you can maybe barely see an assertion involving the title being displayed:</p>
<pre>
&lt;urn:x-domain:oreilly.com:product:9780596516499.IP&gt; dc:title "Natural Language Processing with Python"
</pre>
<p>But there is actually quite a lot of metadata hiding in the page, which can be found by running the page through the <a href="http://www.w3.org/2007/08/pyRdfa">RDFa Distiller</a> (quickly skim over this if your eyes glaze over when you see Turtle):</p>
<pre style="height: 300px;">
@prefix dc: &lt;http://purl.org/dc/terms/&gt; .
@prefix foaf: &lt;http://xmlns.com/foaf/0.1/&gt; .
@prefix frbr: &lt;http://vocab.org/frbr/core#&gt; .
@prefix gr: &lt;http://purl.org/goodrelations/v1#&gt; .
@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
@prefix rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt; .
@prefix xml: &lt;http://www.w3.org/XML/1998/namespace&gt; .
@prefix xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt; .

&lt;urn:x-domain:oreilly.com:product:9780596516499.IP&gt; a frbr:Expression ;
     dc:creator &lt;urn:x-domain:oreilly.com:agent:pdb:3343&gt;, &lt;urn:x-domain:oreilly.com:agent:pdb:3501&gt;, &lt;urn:x-domain:oreilly.com:agent:pdb:3502&gt; ;
     dc:issued "2009-06-12"^^xsd:dateTime ;
     dc:publisher "O'Reilly Media"@en ;
     dc:title "Natural Language Processing with Python"@en ;
     frbr:embodiment &lt;urn:x-domain:oreilly.com:product:9780596516499.BOOK&gt;, &lt;urn:x-domain:oreilly.com:product:9780596803346.SAF&gt;, &lt;urn:x-domain:oreilly.com:product:9780596803391.EBOOK&gt; . 

&lt;http://customer.wileyeurope.com/CGI-BIN/lansaweb?procfun+shopcart+shcfn01+funcparms+parmisbn(a0130):9780596516499+parmqty(p0050):1+parmurl(l0560):http://oreilly.com/store/&gt; a gr:Offering ;
     gr:includesObject
         [ a gr:TypeAndQuantityNode ;
             gr:ammountOfThisGood "1"^^xsd:float ;
             gr:hasPriceSpecification
                 [ a gr:UnitPriceSpecification ;
                     gr:hasCurrency "GBP"@en ;
                     gr:hasCurrencyValue "34.50"^^xsd:float
                 ] ;
             gr:typeOfGood &lt;urn:x-domain:oreilly.com:product:9780596516499.BOOK&gt;
         ] . 

&lt;http://my.safaribooksonline.com/9780596803346&gt; a gr:Offering ;
     gr:includesObject
         [ a gr:TypeAndQuantityNode ;
             gr:ammountOfThisGood "1"^^xsd:float ;
             gr:typeOfGood &lt;urn:x-domain:oreilly.com:product:9780596803346.SAF&gt;
         ] . 

&lt;https://epoch.oreilly.com/shop/cart.orm?p=BUNDLE&#038;prod=9780596516499.BOOK&#038;prod=9780596803391.EBOOK&#038;bundle=1&#038;retUrl=http%3A%252F%252Foreilly.com%252Fstore%252F&gt; a gr:Offering ;
     gr:includesObject
         [ a gr:TypeAndQuantityNode ;
             gr:ammountOfThisGood "1"^^xsd:float ;
             gr:includesObject
                 [ a gr:TypeAndQuantityNode ;
                     gr:ammountOfThisGood "1"^^xsd:float ;
                     gr:hasPriceSpecification
                         [ a gr:UnitPriceSpecification ;
                             gr:hasCurrency "None"@en ;
                             gr:hasCurrencyValue "49.49"^^xsd:float
                         ] ;
                     gr:typeOfGood &lt;urn:x-domain:oreilly.com:product:9780596803391.EBOOK&gt;
                 ] ;
             gr:typeOfGood &lt;urn:x-domain:oreilly.com:product:9780596516499.BOOK&gt;
         ] . 

&lt;https://epoch.oreilly.com/shop/cart.orm?prod=9780596516499.BOOK&gt; a gr:Offering ;
     gr:includesObject
         [ a gr:TypeAndQuantityNode ;
             gr:ammountOfThisGood "1"^^xsd:float ;
             gr:hasPriceSpecification
                 [ a gr:UnitPriceSpecification ;
                     gr:hasCurrency "USD"@en ;
                     gr:hasCurrencyValue "44.99"^^xsd:float
                 ] ;
             gr:typeOfGood &lt;urn:x-domain:oreilly.com:product:9780596516499.BOOK&gt;
         ] . 

&lt;https://epoch.oreilly.com/shop/cart.orm?prod=9780596803391.EBOOK&gt; a gr:Offering ;
     gr:includesObject
         [ a gr:TypeAndQuantityNode ;
             gr:ammountOfThisGood "1"^^xsd:float ;
             gr:hasPriceSpecification
                 [ a gr:UnitPriceSpecification ;
                     gr:hasCurrency "USD"@en ;
                     gr:hasCurrencyValue "35.99"^^xsd:float
                 ] ;
             gr:typeOfGood &lt;urn:x-domain:oreilly.com:product:9780596803391.EBOOK&gt;
         ] . 

&lt;urn:x-domain:oreilly.com:agent:pdb:3343&gt; a foaf:Person ;
     foaf:homepage &lt;http://www.oreillynet.com/pub/au/3614&gt; ;
     foaf:name "Steven Bird"@en . 

&lt;urn:x-domain:oreilly.com:agent:pdb:3501&gt; a foaf:Person ;
     foaf:homepage &lt;http://www.oreillynet.com/pub/au/3615&gt; ;
     foaf:name "Ewan Klein"@en . 

&lt;urn:x-domain:oreilly.com:agent:pdb:3502&gt; a foaf:Person ;
     foaf:homepage &lt;http://www.oreillynet.com/pub/au/3616&gt; ;
     foaf:name "Edward Loper"@en . 

&lt;urn:x-domain:oreilly.com:product:9780596803346.SAF&gt; a frbr:Manifestation ;
     dc:type &lt;http://purl.oreilly.com/product-types/SAF&gt; . 

&lt;urn:x-domain:oreilly.com:product:9780596803391.EBOOK&gt; a frbr:Manifestation ;
     dc:identifier &lt;urn:isbn:9780596803391&gt; ;
     dc:issued "2009-06-12"^^xsd:dateTime ;
     dc:type &lt;http://purl.oreilly.com/product-types/EBOOK&gt; . 

&lt;urn:x-domain:oreilly.com:product:9780596516499.BOOK&gt; a frbr:Manifestation ;
     dc:extent """512"""@en ;
     dc:identifier &lt;urn:isbn:9780596516499&gt; ;
     dc:issued "2009-06-19"^^xsd:dateTime ;
     dc:type &lt;http://purl.oreilly.com/product-types/BOOK&gt; .
</pre>
<p>So that&#8217;s a lot of data. The nice thing about rdf is that you can look at the vocabularies that are being used to get an idea of the rough shape of the underlying data. Just looking at the namespace prefixes we can see that O&#8217;Reilly has chosen to use the following vocabularies:</p>
<ul>
<li><a href="http://dublincore.org/documents/dcmi-terms/">Dublin Core Terms</a>: for indicating the publisher, title, authors, issue date and identifiers for a book</li>
<li><a href="http://xmlns.com/foaf/spec/">Friend of a Friend (FOAF)</a>: for modeling authors as People</li>
<li><a href="http://vocab.org/frbr/core.html">Functional Requirements for Bibliographic Records (FRBR)</a>: for relating a particular book (Expression) to its various Manifestations of the title: ebook, printed book</li>
<li><a href="http://www.heppnetz.de/projects/goodrelations/">Good Relations</a>: for making pricing information available</li>
</ul>
<p>I was curious so I wrote a little <a href="http://inkdroid.org/bzr/oreilly-crawler/crawl.py">crawler</a> (41 lines of Python+rdflib) to collect all the metadata from the O&#8217;Reilly Catalog pages. Yes all the pages! It ended up pulling down 92,101 triples, which I&#8217;ve made available as <a href="http://inkdroid.org/bzr/oreilly-crawler/catalog.rdf">rdf/xml</a> and <a href="http://inkdroid.org/bzr/oreilly-crawler/catalog.bt">ntriples</a> files. </p>
<p>A nice side effect of having the data as a big ntriples file is you can do unix pipe tricks with sort, cut, uniq like <a href="http://inkdroid.org/bzr/bin/rdfsum">this</a> to get some ballpark numbers on what types of resources are in the rdf graph:</p>
<pre>
ed@curry:~/Projects/oreilly-crawler$ rdfsum catalog.nt
   6803 &lt;http://purl.org/goodrelations/v1#TypeAndQuantityNode&gt;
   5861 &lt;http://purl.org/goodrelations/v1#Offering&gt;
   4564 &lt;http://purl.org/goodrelations/v1#UnitPriceSpecification&gt;
   4065 &lt;http://vocab.org/frbr/core#Manifestation&gt;
   2100 &lt;http://vocab.org/frbr/core#Expression&gt;
   2023 &lt;http://xmlns.com/foaf/0.1/Person&gt;
</pre>
<p>Another nice thing about pulling the RDFa down with rdflib is you end up with a little berkeleydb triple store which you can query with SPARQL, say to pull out all the authors and titles:</p>
<pre>
    SELECT ?title ?author
    WHERE {
      ?title_uri dct:title ?title .
      ?title_uri dct:creator ?author_uri .
      ?author_uri foaf:name ?author .
    }
</pre>
<p>And adding a little bit of <a href="http://networkx.lanl.gov/">networkx</a> <a href="http://inkdroid.org/bzr/oreilly-crawler/authorship.py">judo</a> you can get an <strong>xmas-friendly</strong> graph of authors (the green dots are books and the red ones are authors ; I limited author labels to authors who had written more than 2 books).</p>
<p><a href="http://inkdroid.org/bzr/oreilly-crawler/authorship.png"><img src="http://inkdroid.org/images/oreilly-authorship.png" /></a></p>
<p>Admittedly this is not very readable, but I imagine someone with more network visualization skillz could do something nicer in short order. There&#8217;s a lot that could be done with the data. This exercise was mainly just to demonstrate how layering some new stuff into your HTML can really open up doors for how people use your website. Clearly O&#8217;Reilly did some deep thinking about what data they had, and what vocabularies they wanted to model it with. But once they&#8217;d done that they probably just had to go add 50 lines to an HTML template somewhere, and it was published (props to <a href="http://davidbrunton.com">David Brunton</a> for this turn of phrase). It&#8217;s a really good sign that a tech publisher with the stature of O&#8217;Reilly is giving this method of data publishing a try. </p>
<p>My only suggestion (for anyone at O&#8217;Reilly who might be reading) would be that it would be nice if they used HTTP URLs instead of URNs for People, Works and Expressions. I understand why they did it: using URNs eases deployment somewhat since you don&#8217;t have to worry about httpRange-14 stuff. But I think they could easily use a hash URI instead of an URN, and make it easy for people to link to their stuff in other data. The <a href="http://www.w3.org/TR/cooluris/">Cool URIs For the Semantic Web</a> has some other patterns they might want to consider, but simply adding a hash to their existing page URIs should do the trick. So for example, consider if <a href="http://openlibrary.org">OpenLibrary</a> wanted to link their notion of of a book to O&#8217;Reilly&#8217;s notion of a book with owl:sameAs. If they used they URN they&#8217;d have:</p>
<pre>
&lt;http://openlibrary.org/b/OL23978297M&gt; owl:sameAs &lt;urn:x-domain:oreilly.com:product:9780596516499.IP&gt; .
</pre>
<p>but if O&#8217;Reilly identified their expressions with a URL they would enable something like:</p>
<pre>
&lt;http://openlibrary.org/b/OL23978297M&gt; owl:sameAs &lt;http://oreilly.com/catalog/9780596516499#expression&gt; .
</pre>
<p>This may seem like a minor point, but it&#8217;s really important to be able to follow your nose on the web&#8211;particularly in <a href="http://linkeddata.org">Linked Data</a>. If a piece of software ran across the O&#8217;Reilly URL in a chunk of OpenLibrary RDF, the program could HTTP GET it, and learn more stuff about the book. <em>But if it got the URN it wouldn&#8217;t really know how to fetch a representation for that resource without some special case logic that mapped the URN to a URL</em>. There is a reason why Tim Berners-Lee included the following as the second of his <a href="http://www.w3.org/DesignIssues/LinkedData.html">design principles</a> for Linked Data:</p>
<blockquote><p>
Use HTTP URIs so that people can look up those names.
</p></blockquote>
<p>Anyhow, hats off to O&#8217;Reilly for putting RDFa into practice. I hope the rest of the publishing (and library world) take note. If you are looking to learn more about RDFa <a href="http://adida.net/">Ben Adida</a> and <a href="http://webbackplane.com/mark-birbeck">Mark Birbeck</a>&#8216;s <a href="http://www.w3.org/TR/xhtml-rdfa-primer/">RDFa Primer: Bridging the Human and Data Webs</a> is a really nice intro.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2009/12/22/hacking-oreilly-rdfa/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>New York Times Topics as SKOS</title>
		<link>http://inkdroid.org/journal/2009/08/18/new-york-times-topics-as-skos/</link>
		<comments>http://inkdroid.org/journal/2009/08/18/new-york-times-topics-as-skos/#comments</comments>
		<pubDate>Wed, 19 Aug 2009 04:46:59 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[publishing]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[beautifulsoup]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[nytimes]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[rdfa]]></category>
		<category><![CDATA[rdflib]]></category>
		<category><![CDATA[skos]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=1094</guid>
		<description><![CDATA[Serves 23,376 SKOS Concepts INGREDIENTS Text editor: Vim, Emacs, TextMate, etc Python BeautifulSoup rdflib Internet connection DIRECTIONS Open a new file using your favorite text editor. Instantiate an RDF graph with a dash of rdflib. Use python&#8217;s urllib to extract the HTML for each of the Times Topics Index Pages, e.g. for A. Parse HTML [...]]]></description>
			<content:encoded><![CDATA[<p><em>Serves 23,376 SKOS Concepts</em></p>
<p><strong>INGREDIENTS</strong></p>
<ul>
<li>Text editor: <a href="http://www.vim.org/">Vim</a>, <a href="http://www.gnu.org/software/emacs/">Emacs</a>, <a href="http://macromates.com/">TextMate</a>, etc</li>
<li><a href="http://python.org">Python</a></li>
<li><a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a></li>
<li><a href="http://rdflib.net">rdflib</a></li>
<li>Internet connection</li>
</ul>
<p><strong>DIRECTIONS</strong></p>
<ol>
<li>Open a new file using your favorite text editor.</li>
<li>Instantiate an RDF graph with a dash of rdflib.</li>
<li>Use python&#8217;s urllib to extract the HTML for each of the Times Topics Index Pages, e.g. for <a href="http://topics.nytimes.com/top/reference/timestopics/all/a">A</a>.</li>
<li>Parse HTML into a fine, queryable data structure using BeautifulSoup.</li>
<li>Locate topic names and their associated URLs, and gently add them to the graph with a pinch of SKOS.</li>
<li>Go back to step 3 to fetch the next batch of topics, until you&#8217;ve finished <a href="http://topics.nytimes.com/top/reference/timestopics/all/z">Z</a>.</li>
<li>Bake the RDF graph as an rdf/xml file.</li>
</ol>
<p><strong>NOTES</strong></p>
<p>If you don&#8217;t feel like cooking up the rdf/xml yourself you can download it from <a href="http://inkdroid.org/bzr/timestopics/timestopics.rdf">here</a> (might want to right-click to download, some browsers might have trouble rendering the xml), or download the 68 line <a href="http://inkdroid.org/bzr/timestopics/timestopics.py">implementation</a> and run it yourself. </p>
<p>The point of this exercise was mainly to show how thinking of the New York Times Topics as a <a href="http://en.wikipedia.org/wiki/Controlled_vocabulary">controlled vocabulary</a>, that can be serialized as a file, and still present on the Web, could be useful. Perhaps to someone writing an application that needs to integrate with the New York Times and who want to be able to tag content using the same controlled vocabulary. Or perhaps someone wants to be able to link your own content with similar content at the New York Times. These are all use cases for expressing the Topics as SKOS, and being able to ship it around with resolvable identifiers for the concepts.</p>
<p>Of course there is one slight wrinkle. Take a look at this <a href="http://en.wikipedia.org/wiki/Turtle_(syntax)">Turtle</a> snippet for the concept of Ray Bradbury:</p>
<pre>
@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
@Prefix skos: &lt;http://www.w3.org/2004/02/skos/core#&gt; .

&lt;http://topics.nytimes.com/top/reference/timestopics/people/b/ray_bradbury#concept&gt; a skos:Concept;
    skos:prefLabel "Bradbury, Ray";
    skos:broader &lt;http://topics.nytimes.com/top/reference/timestopics/people#concept&gt;;
    skos:inScheme &lt;http://topics.nytimes.com/top/reference/timestopics#conceptScheme&gt;
    .
</pre>
<p>Notice the URI being used for the concept?</p>
<pre>

http://topics.nytimes.com/top/reference/timestopics/people/b/ray_bradbury#concept</pre>
<p>The wrinkle is that there&#8217;s no way to get RDF back from this URI currently. But since NYT is already using XHTML, it wouldn&#8217;t be hard to sprinkle in some RDFa such that:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;
    xmlns:skos=&quot;http://www.w3.org/2004/02/skos/core#&quot;&gt;
...
&lt;h1 about=&quot;http://topics.nytimes.com/top/reference/timestopics/people/b/ray_bradbury#concept&quot; property=&quot;skos:prefLabel&quot;&gt;Ray Bradbury&lt;/h1&gt;
...
&lt;/html&gt;</pre></div></div>

<p>And <em>voila</em> you&#8217;ve got <a href="http://www.w3.org/DesignIssues/LinkedData.html">Linked Data</a>. I took the 5 minutes to mark up the HTML myself and put it <a href="http://inkdroid.org/data/bradbury.html">here</a> which you can run through the <a href="http://www.w3.org/2007/08/pyRdfa/">RDFa Distiller</a> to get some <a href="http://www.w3.org/2007/08/pyRdfa/extract?uri=http%3A%2F%2Finkdroid.org%2Fdata%2Fbradbury.html&#038;format=turtle&#038;warnings=false&#038;parser=lax&#038;space-preserve=true&#038;submit=Go%21&#038;text=">Turtle</a>. Of course if the NYT ever decided to alter their HTML to provide this markup this recipe would be simplified greatly: no more error prone scraping, the assertions could be pulled directly out of the HTML. </p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2009/08/18/new-york-times-topics-as-skos/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>public.resource.org to liberate Code of Federal Regulations</title>
		<link>http://inkdroid.org/journal/2008/09/17/publicresourceorg-to-liberate-code-of-federal-regulations/</link>
		<comments>http://inkdroid.org/journal/2008/09/17/publicresourceorg-to-liberate-code-of-federal-regulations/#comments</comments>
		<pubDate>Wed, 17 Sep 2008 21:51:15 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[politics]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[gpo]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=365</guid>
		<description><![CDATA[good news via the govtrack mailing list Carl Malamud of public.resource.org, with funding from a bunch of places including a small bit from GovTrack&#8217;s ad profits, announced his intention to purchase from the Government Printing Office documents they produce in the course of their statutory obligations and then have the nerve to sell back to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://public.resource.org/gpo.gov"><img src="http://inkdroid.org/images/public-resource-org.png" style="border: none; float: right;" /></a></p>
<p>good news via the <a href="http://groups.yahoo.com/group/govtrack/message/629">govtrack mailing list</a></p>
<blockquote><p>
Carl Malamud of public.resource.org, with funding from a bunch of places including a small bit from GovTrack&#8217;s ad profits, announced his intention to purchase from the Government Printing Office documents they produce in the course of their statutory obligations and then have the nerve to sell back to the public at prohibitive prices. The document to be purchased is the Code of Federal Regulations, the component of federal law created by executive branch agencies, in electronic form. Once obtained, it will be posted openly/freely online.</p>
<p>More here: <a href="http://public.resource.org/gpo.gov/index.html">http://public.resource.org/gpo.gov/index.html</a></p>
<p>And Carl&#8217;s letter to the GPO:<br />
<a href="http://public.resource.org/gpo.gov/the_honorable.html">http://public.resource.org/gpo.gov/the_honorable.html</a>
</p></blockquote>
<p>It&#8217;s pretty sad that it has to come to this&#8230;but it&#8217;s also pretty awesome that it&#8217;s happening.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/09/17/publicresourceorg-to-liberate-code-of-federal-regulations/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>provide and enable</title>
		<link>http://inkdroid.org/journal/2008/06/18/provide-and-enable/</link>
		<comments>http://inkdroid.org/journal/2008/06/18/provide-and-enable/#comments</comments>
		<pubDate>Wed, 18 Jun 2008 18:05:24 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[politics]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[archives]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[rdfa]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=204</guid>
		<description><![CDATA[I got a chance to meet Jennifer Rigby of the National Archives UK at the LinkedDataPlanet Conference in New York City (thanks Ian). Jennifer is the Head of IT Strategy, and told me lots of interesting stuff related to a profound shift they&#8217;ve had in their online strategies to: Provide and Enable So rather than [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://inkdroid.org/images/natarch.gif" style="float: left; border: thin inset gray; margin-right: 15px;" />I got a chance to meet Jennifer Rigby of the <a href="http://www.nationalarchives.gov.uk/">National Archives UK</a> at the <a href="http://linkeddataplanet.com">LinkedDataPlanet</a> Conference in New York City (thanks <a href="http://iandavis.com">Ian</a>).  Jennifer is the Head of IT Strategy, and told me lots of interesting stuff related to a profound shift they&#8217;ve had in their online strategies to:</p>
<blockquote><p>
<a href="http://www.nationalarchives.gov.uk/documents/provide-enable-summary.pdf">Provide and Enable</a>
</p></blockquote>
<p>So rather than pouring all their energy into making applications to visualize archival resources, the National Archives have recognized that making machine readable resources available to the public (in formats like RDF and RDFa) is really important to their core mission. In addition to <em>providing</em> services and data, they are trying to <em>enable</em> an ecosystem of innovation around their assets&#8211;or in their words:</p>
<p>• We will allow others to harness the power of our information, leading to a far wider range of products and services than we could provide ourselves.<br />
• We will continue to work with commercial partners to provide online access to millions of records.</p>
<p>Jennifer said we can look forward to an announcement around <a href="http://www.ukuug.org/events/opentech2008/">OpenTech2008</a> (July 5th) about a set of important publications that are going to made available by the Archives as RDF and RDFa. In addition I heard about how they work with website data harvested by <a href="http://archive.org">Internet Archive</a> to create a resolver service for transient publications on the web.</p>
<p>Hearing how a big organization like the National Archives can come to this realization of &#8220;Provide and Enable&#8221;, and then start to execute on it was really encouraging&#8211;and inspiring.  It is also refreshing to see people recognize, in writing the importance of semantic web technologies:</p>
<blockquote><p>We have started exploring new ideas and technologies, including using RDFa for publishing the Gazettes. The way we now publish legislation has a key role to play in the further development of the semantic web.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/06/18/provide-and-enable/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>justify my links</title>
		<link>http://inkdroid.org/journal/2008/05/29/justify-my-links/</link>
		<comments>http://inkdroid.org/journal/2008/05/29/justify-my-links/#comments</comments>
		<pubDate>Thu, 29 May 2008 12:58:07 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[nyc]]></category>
		<category><![CDATA[semanticweb]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=201</guid>
		<description><![CDATA[Thanks to a tip from Ian, I&#8217;m looking forward to (hopefully) attending the Linked Data Planet conference in New York City as a volunteer. The idea is that I just have to pay for my hotel, and the cost of admission is waived. It seems my travel money is a bit limited at the moment [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to a tip from <a href="http://iandavis.com/">Ian</a>, I&#8217;m looking forward to (hopefully) attending the <a href="http://www.linkeddataplanet.com/">Linked Data Planet</a> conference in New York City as a volunteer. The idea is that I just have to pay for my hotel, and the cost of admission is waived. It seems my travel money is a bit limited at the moment (sometimes it&#8217;s there, sometimes it isn&#8217;t), so I figured minimizing costs would be appreciated. But today I got a request to &#8220;justify&#8221; my attendance at the conference. It was actually kind of a good exercise to sit down and write why I think the conference and <a href="http://linkeddata.org/">Linked Data</a> in general is important to the <a href="http://loc.gov">Library of Congress</a>.</p>
<blockquote><p>One of the challenges of Digital Repository work is modeling the context for digital objects. The context for a digital object includes the set of relationships a particular digital object has with other objects in the repository. 30 years of relational database research and development have allowed us to do this modeling pretty effectively within the scope of a particular application.</p>
<p>Very often, particularly in institutions the size of the Library of Congress, the context for a digital object includes digital objects found elsewhere in the enterprise&#8211;in other applications, with their own databases. In addition some institutions (like LC) also need to make their digital resources available publicly for other organizations to reference. The challenge here is in making the objects found in silos or islands of application data (typically housed in databases) reference-able and resolvable, so that other applications inside and outside the enterprise can use them.</p>
<p>As a practical example, a  picture of Dizzie Gilliespie found in the America Memory collection </p>
<div style="text-align: center;">
<a href=" http://lcweb2.loc.gov/cgi-bin/query/i?ammem/van:@field(NUMBER+@band(van+5a52027)):displayType=1:m856sd=van:m856sf=5a52027 "><img src="http://memory.loc.gov/pnp/van/5a52000/5a52000/5a52027r.jpg" /></a>
</div>
<p>is related to the book:</p>
<p><em><br />
  To be, or not&#8211;to bop: memoirs / Dizzy Gillespie, with Al Fraser.<br />
</em></p>
<p>which we have described in our <a href="http://lccn.loc.gov/84029213">online catalog</a>. The person Dizzy Gillespie is also represented in LC&#8217;s name authority file with the <a href="http://www.loc.gov/marc/lccn.html">Library of Congress Control Number</a> n50033872, and the <a href="http://orlabs.oclc.org/viaf/LC|n50033872 ">Linked Authority File at OCLC</a>. And perhaps this picture of Dizzie Gillespie in American Memory will find it&#8217;s way into the <a href="http://memory.loc.gov/pnp/van/5a52000/5a52000/5a52027r.jpg">World Digital Library</a> application that is currently being built. How can we practically and explicitly identify and then represent the relationships between these resources? Is it even possible?</p>
<p>The Linked Data Planet conference is a two day workshop describing how to use traditional web technologies in conjunction with semantic web technologies (RDF, OWL, SPARQL, RDFa and GRDDL) to enable this sort of linking of resources inside particular applications, within the enterprise and around the world. My hope is that the conference will provide guidance on simple things LC can do with web technologies that have been in use for 20 years, to model the relationships between digital resources at the Library of Congress.
</p></blockquote>
<p>Hopefully that will convince them :-)</p>
<p><em>Apologies to <a href="http://en.wikipedia.org/wiki/Justify_My_Love">Madonna</a> for the blog post title&#8230;</em></p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/05/29/justify-my-links/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>permalinks reloaded</title>
		<link>http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/</link>
		<comments>http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/#comments</comments>
		<pubDate>Mon, 17 Dec 2007 21:17:07 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.inkdroid.org/journal/2007/12/17/permalinks-reloaded/</guid>
		<description><![CDATA[The recently announced Zotero / InternetArchive partnership is exciting on a bunch of levels. The one that immediately struck me was the use of the Internet Archive URI. As you may have noticed before all the content in Internet Archive Wayback Machine can be referenced with a URL that looks something like: http://web.archive.org/web/{yyyymmddhhmmss}/{url} Where url [...]]]></description>
			<content:encoded><![CDATA[<p>The recently <a href="http://www.dancohen.org/2007/12/12/zotero-and-the-internet-archive-join-forces/">announced</a> <a href="http://zotero.org">Zotero</a> / <a href="http://archive.org">InternetArchive</a> partnership is exciting on a bunch of levels. The one that immediately struck me was the use of the Internet Archive URI. As you may have noticed before all the content in Internet Archive <a href="http://web.archive.org">Wayback Machine</a> can be referenced with a URL that looks something like:</p>
<ul>
<li>http://web.archive.org/web/{yyyymmddhhmmss}/{url}</li>
</ul>
<p>Where url is the document URL you want to look up in the archive at the given time. So for example:</p>
<ul>
<li><a href="http://web.archive.org/web/19981202230410/http://www.google.com/">http://web.archive.org/web/19981202230410/http://www.google.com/</a></li>
</ul>
<p>is a URL for what http://google.com looked like on December 02, 1998 at 23:04:10. Perhaps this is documented somewhere prominent or is common knowledge, but it looks like you can play with the timestamp, and archive.org will adjust as needed, redirecting you to the closest snapshot it can find:</p>
<ul>
<li><a href="http://web.archive.org/web/19981202/http://www.google.com/">http://web.archive.org/web/19981202/http://www.google.com/</a></li>
<li><a href="http://web.archive.org/web/199812/http://www.google.com/">http://web.archive.org/web/199812/http://www.google.com/</a></li>
<li><a href="http://web.archive.org/web/1998/http://www.google.com/">http://web.archive.org/web/1998/http://www.google.com/</a></li>
</ul>
<p>and even:</p>
<ul>
<li><a href="http://web.archive.org/web/http://www.google.com/">http://web.archive.org/web/http://www.google.com/</a></li>
</ul>
<p>which redirects to the most recent content for a given URL. It&#8217;s just a good old 302 at work:</p>
<pre>
ed@curry:~$ curl -I http://web.archive.org/web/199812/http://www.google.com/
HTTP/1.1 302 Found
Date: Mon, 17 Dec 2007 21:11:12 GMT
Server: Apache/2.0.54 (Ubuntu) PHP/5.0.5-2ubuntu1.2 mod_ssl/2.0.54 OpenSSL/0.9.7g mod_perl/2.0.1 Perl/v5.8.7
Location: http://web.archive.org/web/19981202230410/www.google.com/
Content-Type: text/html; charset=iso-8859-1
</pre>
<p>So anyhow, pretty cool use of URIs and HTTP right? The addition of zotero to the mix will mean that scholars can cite the web as it appeared at a particular point in time:</p>
<blockquote><p>
&#8230; as scholars begin to use not only traditional primary sources that have been digitized but also “born digital” materials on the web (blogs, online essays, documents transcribed into HTML), the possibility arises for Zotero users to leverage the resources of IA to ensure a more reliable form of scholarly communication. One of the Internet Archive’s great strengths is that it has not only archived the web but also given each page a permanent URI that includes a time and date stamp in addition to the URL.</p>
<p>Currently when a scholar using Zotero wishes to save a web page for their research they simply store a local copy. For some, perhaps many, purposes this is fine. But for web documents that a scholar believes will be important to share, cite, or collaboratively annotate (e.g., among a group of coauthors of an article or book) we will provide a second option in the Zotero web save function to grab a permanent copy and URI from IA’s web archive. A scholar who shares this item in their library can then be sure that all others who choose to use it will be referring to the exact same document.
</p></blockquote>
<p>This is pretty fundamental to scholarship on the web. Of course when generating a time anchored permalink with zotero one can well expect that archive.org will on occasion not have a snapshot of said content, resulting in a 404. It would be great if archive.org could leverage these requests for snapshots as requests to go out and archive the page. One could imagine a blocking and nonblocking request: the former which would spawn a request to fetch a particular URI, stash content away, and return the permalink; and the latter which would just quickly return the best match its already got (which may be a 404).</p>
<p>Anyhow, it&#8217;s really good to see these two outfits working together. Nice work! </p>
<p><i>ps. dear lazyweb is there a documented archive.org api available?</i></p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
