geocoder and rdf

While fielding a question on a local Perl list this weekend I ran across some more RDF alive and kicking in the very useful geocoder.us service. They have a nice RESTful web service, which allows you to drop an address or intersection into a URL like:

http://rpc.geocoder.us/service/rest?address=
1340%20Ridgeview%20Drive%20McHenry%2C%20Illinois%2060050

and get back the longtitude and latitude in a chunk of RDF like:


<?xml version="1.0"?>
<rdf:RDF
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<geo:Point rdf:nodeID="aid87293465">
    
    <dc:description>899 Ridgeview Dr, McHenry IL 60050</dc:description>
    <geo:long>-88.291658</geo:long>
    <geo:lat>42.314936</geo:lat>

</geo:Point>


</rdf:RDF>  

Of course this data could be encoded in comma-separated-values, in fact they have a similar RESTful service that does just that:

http://rpc.geocoder.us/service/csv?address=
1340%20Ridgeview%20Drive%20McHenry%2C%20Illinois%2060050

which returns:


42.314936,-88.291658,899 Ridgeview Dr,McHenry,IL,60050

Does this mean RDF isn’t necessary? For someone who is just querying geocoder.us directly and knows what the output is I guess the RDF doesn’t really add that much value. My coworker Bill likes to talk about being explicit in code whenever possible, and the RDF in this case is more explicit. Until there are programs that follow lines of inference using this data it’s largely a matter of taste. It’s nice that geocoder supports both world views.

And hats off to geocoder: they give away their software and how they built the service to anyone who wants it. They provide expertise in using the data, and also offer commercial access to their web services which have the 10 second or so pause between requests disabled. What an interesting model for a company. Heck, wouldn’t it be nice if OCLC operated this way?


On Lateral Thinking

I recently checked out Zen and the Art of Motorcycle Maintenance after reading Kevin’s piece about how the book informed his practice of library cataloging. I am enjoying it a lot more this time around, and have found it really informs my practice of computer programming as well. Unfortunately I only made it half way through before it needed to be returned to the library, and the local superbookstores oddly enough don’t seem to carry it…So, I’ve got a copy on order from a used bookstore I found through Amazon. Anyhow here’s one nice quote I jotted down before I had to return the book:

At first the truths Phaedrus began to pursue were lateral truths; no longer the frontal truths of science, those toward which the discipline pointed, but the kind of truth you see laterally, out of the corner of your eye. In a laboratory situation, when your whole procedure goes haywire, when everything goes wrong or is indeterminate or is so screwed up by unexpected results you can’t make head or tail out of anything, you start looking laterally. That’s a word he later used to describe a growth of knowledge that doesn’t move forward like an arrow in flight, but expands sideways, like an arrow enlarging in flight, or like the archer, discovering that although he has hit the bull’s eye and won the prize, his head is on a pillow and the sun is coming in the window. Lateral knowledge is knowledge that’s from a wholly unepected direction, from a direction that’s not even understood as a direction until the knowledge forces itself upon one. Lateral truths point to the falseness of axioms and postulates underlying one’s existing system of getting at truth.

I’m not entirely sure why this resonated with me. I think the idea of “lateral thinking” reminds me of how IRC and web surfing often informs my craft of writing software. While many universities offer computer “science” programs, I’ve found a large component of writing software is more artistic than scientific. Of course I’m hardly the first person to comment on this…but Zen and the Art of Motorcycle Maintenance is full of good advice for writing and tuning your programs. Hopefully I’ll get to write more about them in here when I get my copy in the mail.


trackbacks at arXiv

I just read (thanks jeff) about how arXiv.org has implemented experimental trackback support. Essentially this allows researchers who maintain online journals to simply reference an abstract like File-based storage of Digital Objects and constituent datastreams: XMLtapes and Internet Archive ARC files (a great article by the way) and arXiv will receive a trackback ping at http://arxiv.org/trackback/0503016 that lets them know someone referenced the abstract. If you’ve followed this so far you might be wondering how the blogging software (wordpress, moveabletype, blosxom, etc) figure out where to ping arxiv.org. Take a look in the source code for the arXiv abstract and you’ll see a chunk of RDF:


delicious json

I just noticed over on the del.icio.us blog that my data is available as JavaScript Object Notation (JSON) by following a simple URL like: http://del.icio.us/feeds/json/inkdroid.

Essentially you just load the URL as javascript source in your HTML:


<script type=“text/javascript” src=“http://del.icio.us/feeds/json/inkdroid?count=20”></script>

and voila you’ve magically got a new javascript array variable Delicious.posts, each element of which is a hash describing your link on delicious. It’s a very elegant (and simple) technique…much more elegant than that taken in the XML::RSS::JavaScript module which I helped create. It’s so elegant in fact that I got it working off to the side of this page in 2 minutes. I downloaded the python and ruby extensions for working with JSON just to take a look. The python version is a pleasant read, especially the unittests! The ruby version is a lesson in minimalism:

jsonobj = eval(json.gsub(/(["'])\s*:\s*(['"0-9tfn\[{])/){"#{$1}=>#{$2}"})

Now, if I were to use this I’d probably put a wrapper around it :-) Although it’s less minimalistic I think I prefer the explicitness of the python code. I’ve been digging into Ruby a bit more lately as I work on ruby-marc, and while I’m really enjoying the language I tend to shy away from one line regex hacks like this…which more often than not turn out to be a pain to extend and maintain.

I first heard of JSON from Mike Rylander at the open-ils project who are using JSON heavily in the opensource library catalog that they are developing for the state of Georgia. It is nice to see library technologists leading the curve.


Lockheed Martin and NARA

After 7 years of consultation Lockheed-Martin has been selected to build the Electronic Records Archives for the National Archives and Records Administration for 38 million dollars.

ERA will provide NARA with the capability to authentically preserve and provide access to any kind of electronic record free from dependence on any specific hardware or software.

There are some aging exploratory papers on the NARA site, along with what appears to be a copy of the RFP…but I can’t seem to find any specific information on how Lockheed Martin is planning to do this. I wonder what sort of track record L-M has in building electronic archiving software. Do they have an existing system which they are going to modify for NARA, or are they going to be building a new system from scratch? It sure would be interesting to hear some more details.


Open Documents

Have you ever had trouble importing one type of word processing document into your current word processor? Perhaps you’re using the same word processor, but are trying to import a document you created with an earlier version. Imagine for a moment what this will mean for a historian who is trying to research some correspondence fifty years from now. How about five hundred years from now? Are historians going to have to be computer hackers who have superhuman reverse engineering talents? Will there be mystical emulators that let you convert your modern computer into an insanely slow Pentium processor CPU running Windows 95 and Word 7? How will you even know what format a document is in?

There’s been some inspiring developments in Massachusetts who have decided to use the OpenDocument Format instead of Mircrosoft’s OpenXML. David Wheeler does a really nice job of summarizing what this means for open source development, and how Microsoft can choose to recover. I had no idea (but was not surprised to learn) that the royalty free license that Microsoft is using to distribute its “open” document format is incompatable with the popular GNU open source license. Ironically this seems to have been a calculated move by Microsoft to exclude open source developers from working with the open formats. Isn’t the whole point of an open document format to be open?

Thank goodness the folks in Massachusetts are on the ball, asking the right questions, and not simply following the money, power and status quo. At the same time they’re not exclusively endorsing the GPL; but have wisely decided to include as many possible development environments as possible. This sounds like the best way to making a truly archivable document format that will be good for “long haul institutions” like libraries and archives. Hopefully other public organizations will consider taking a similiar approach. Thanks Bruce for writing about this.


File under m for megalomania

Google Announces Plan To Destroy All Information It Can’t Index.

Although Google executives are keeping many details about Google Purge under wraps, some analysts speculate that the categories of information Google will eventually index or destroy include handwritten correspondence, buried fossils, and private thoughts and feelings.

Seriously, many a truth is said in jest. With the news that Google is going to be selling another 4 billion dollars worth of shares it makes sense that they would be thinking of a purging program to balance out their binging. What are they going to do with 4 billion dollars? I can’t even begin to imagine. It is frankly, a bit frightening, and seems like behavior one might read about in the DSM IV.


A Tale of Two Searches

There’s been some interesting discussion about SRW/U vs OpenSearch on some library email lists, blogs, and over in #code4lib. I worked on the SRU and CQL::Parser modules for the Ockham Initiative, and have watched the comparisons to A9’s RSS based OpenSearch with great interest. It’s amazing how similar the goals of these two search technologies are, and yet how different the implementations and developer communities are.

At their most basic both SRW/U and OpenSearch aim to make it easy to conduct searches over the web. They both want to spread distributed searching over the web for the masses. SRW/U grew up before OpenSearch at the Library of Congress, mainly on a small implementors list. It allows you to use SOAP as a transport layer, or simple XML over HTTP using a RESTful interface. The results can really be any type of XML, and there is no lingua franca of DublinCore like in OAI-PMH. SRW/U comes with an attendent query specification known as the Common Query Language (CQL). So there are a fair amount of moving pieces in building even a barebones SRW/U server.

OpenSearch on the other hand is relatively new, and was developed by A9 (a subsidiary of Amazon who know a thing or two about building robust easy to use web services). Really it’s just a RESTful interface for obtaining search results in RSS 2.0. There is talk that v1.1 might have some extensions to support more refined queries and xmlnamespaces for bundling different types of XML results…but at the moment there’s no need to parse queries, or to be able to handle any flavor of XML other than RSS 2.0.

When comparing the two sites one thing makes itself clear: the SRW/U site is a shambles–the specification itself is fragmented, and as a result there’s information all over the place. The OpenSearch page is neatly laid out with examples, rationale and even has a developers blog. The key here I think is that OpenSearch started simple and is slowly adding functionality that might be needed. SRW/U started out trying to simplify an existing standard and is slowly trying to make it simpler (there’s even been suggestions to drop the SOAPy SRW altogether and focus on the RESTful SRU). They’re moving in opposite directions. I don’t really have any doubts about which standard will see the widest deployment. The virtues of keeping things simple have been noted (very eloquently) by Adam Bosworth.

There is hope for library technologists though. OCLC is doing some really good work like their Open WorldCat program which allows you to link directly to local holdings for a book with a URL like:

http://worldcatlibraries.org/wcpa/isbn/0670032506&loc=60014

Yeah, that’s an ISBN and a ZIP code. Oh, and I installed Chris Fairbanks’ nice OpenSearch/WordPress plugin in like 5 minutes. Here’s an example:

http://www.inkdroid.org/journal/os-query?s=code4lib.

Drop it in an RSS reader and you can see whenever I write about code4lib. Not that you would really want to do that. Hmm but maybe it would be useful with say an online catalog or bibliographic database!


Intelligent Design


I was so pleased to read recently that there are others who find it appalling that the Kansas School Board isn’t considering teaching the solid scientific evidence for the Flying Spaghetti Monster. If the statistics on pirates and global warming aren’t enough to convince you, I have a little story to relate. One night I was driving along the road, and I saw some strange lights in the clouds. At first I thought it might be an airplane, but when I pulled over and got out I caught the aroma of spaghetti sauce and cooked pasta. I looked at my shirt, and didn’t see any spaghetti stains so I immediately thought that it had been the Flying Spaghetti Monster in the clouds. The next day I decided to hire an artist to draw what I imagined the Flying Spaghetti Monster to look like, lurking in the clouds. The result is above, and it’s exactly like the one I have seen reported elsewhere! Since I paid for this artist to do the picture, it is obviously scientifically accurate. I hope that this leaves no doubt in your mind, that FSM is not only a reality, but a very cool one indeed.


backdoors

The FCC is mandating that Internet providers and network appliance manufacturers build in backdoors so that the spooks can monitor electronic communication between, well you know, terrorists and stuff. What a profoundly bad idea…do they really think that the secret access mechanism will stay a secret? And when it leaks out, what sort of access will the crooks have, and will they ever know it was leaked?

At least the effort to tap is on the table, unlike the Data Encryption Standard which was supposedly introduced with a deliberately small keysize to ease decoding by the NSA.

Thanks to the ever vigilant EFF for reporting on this. Which reminds me, my membership is up for renewal.