Archive for the ‘opensource’ Category

code4lib conference shaping up

Wednesday, January 11th, 2006

The votes are in, and the a tentative schedule is up. There were a remarkable amount of wonderful presentation ideas submitted, and unfortunately there wasn’t the time/space for all of them. Fortunately there will be lightning talks and breakout sessions that will hopefully pick up some of the slack.

The talks were voted on by anyone who planned on attending. That’s right *anyone*. This was like a breath of fresh air for me. The voting mechanism was a genius javascript hack at the 11th hour by Ross Singer which allowed drupal users on code4lib.org to annotate a backpack page, which stored results in a database at gatech.edu. We even hooked up our resident bot in #code4lib to be able to talk to the database and get up to the minute polling results.

Anyhow, things are looking really good for the conference. If you were waiting for the presentation to firm up before registering take a look at the schedule. And if you need anymore convincing checkout Lorcan Dempsey’s blog which says it all.

good fences and the frankenweb

Tuesday, January 3rd, 2006

Ian Bicking has some interesting notes about competing web development technologies–mainly in response to some posts from Ivan Krystic. The discussion is definitely recommended, especially if you find yourself looking at web application frameworks for Python and Ruby. I found the pivot point of the discussion to be around a new term (for me) — the “frankenweb”.

My understanding is that like Frankenstein (a being created by stitching together random body parts from dead humans) the frankenweb is an unholy mixture of MVC components pulled from different projects, when put together result in an ugly partially functional whole. I think this characterization of Ian’s work is really unpleasant, but strangely compelling. I think that this is mainly because of Ian’s response:

The “Frankenweb” is a feature, and it describes the web we have, the software we have, and the future that is inevitable. The world was never all J2EE, or ASP(.NET), or PHP, and it won’t be all Rails either.

I think Ian is right on about this: “frankenweb” does describe the web we have, and hopefully the web we will continue to have–and the degree to which we can all interoperate is the degree to which the web will succeed. Perhaps I’m seeing the frankenweb through Weinberger-Colored-Glasses, having just finished Small Pieces Loosely Joined (which I thoroughly enjoyed and plan to write about later if there is time). Weinberger does an excellent job of distilling the essence of the web, and how its architecture enabled it to pull itself up from it’s own bootstraps, grow and adapt:

In the real world, I can’t just put in a door from my apartment to my neighbor’s so that anyone can go through. But that’s exactly how the web was built. Tim Berners-Lee orginally created the web so that scientists could link to the work of other scientists without having to ask their permission. If I put a page into the public Web, you can link to it without having to ask to do anything special, without asking me if it’s alright with me, and without even letting me know that you’ve done it…The web couldn’t have been built if everyone had to ask permission first.

Of course I’m conflating links between pages, and API links between software components…but what Ian says about embracing the frankenweb seems to resonate with this somehow.

It’s also quite disorienting to hear Ivan and others lauding tight coupling:

You don’t see the Ruby on Rails guys modularizing Rails to the point of pain. You see them delivering a single, high-polish, tightly coupled product that does its job well.

Given the various pluggable modules that make up Rails I think “tightly coupled” is largely an overstatement. Granted they are available in the same code base, and I haven’t tried to use one of them in isolation–but I imagine it could be done if someone wanted to say, use a activerecord model in a script or something. The Pragmatic Programmer has a really nice chapter on decoupling, and the authors are actually heavily involved in the Ruby/Rails community. The chapter starts out with a nice quote from Robert Frost’s poem The Mending Wall:

Good fences make good neighbors.

It seems to me that Ian is doing the hard work of patching some of these fences, and building a few and deserves a lot of credit for the effort and cat herding.

opensearch and autodiscovery

Wednesday, December 14th, 2005

I just noticed that a9 has released a second draft of opensearch v1.1. This draft includes details on opensearch autodiscovery for providing a reference to the opensearch description file in an HTML page. This could have a lot of potential for browser plugins. Also, they’ve added a Query element that can be used for echoing back the query that was used to generate results…kinda like the echoedRequest in SRU. These are the things that popped out at me. Of course the big news in the first draft was that Atom can now be used in responses.

At any rate it was nice to see that they link to my opensearch python library from their tools page. Once 1.1 moves from draft I’m going to work on upgrading it from 1.0 right away.

a citation microformat - when worlds collide

Thursday, November 3rd, 2005

Tim White has taken the time to prod the microformats list about the citation microformat that’s been floating around for a few months. It’s really encouraging that a developer at Gale is thinking of using a citation microformat. While I also work in the industry I’ve been coming at the citation microformat from a slightly different angle. For the past few months I’ve been monitoring activity in microformat land while watching another group of library technologists. Recently, Bill Burcham’s Baby Steps to Synergistic Web Apps and Half a Baby Step confirmed a nagging feeling I was having that the two communities were converging.

The “other” group are library/programmer friends of mine in #code4lib. These guys have been brainstorming about adapting the widely used OpenURL for use in HTML. OpenURL is used extensively in the academic library environment to enable linking to licensed content from online indexes. OpenURL essentially provides guidelines for encoding citation metadata in URLs, which has given birth to an ecosystem of vendors/developers who can provide resolver and content services. Context Object in Spans (COinS) provides a microformatty way to put openurls (without reference to an openurl resolver) into HTML. I’m not doing this work justice, so if you’re curious to see how COinS got started there’s lots of content in the gather-create-share discussion list. COinS exists in the wild at citeulike, hubmed, Current Law Journal Content.

Now after reading up about microformats and posting to the discussion list, and talking to Brian Suda it became clear that COinS as it stands now isn’t really usable as a microformat. Microformats center around marking up human readable data with semantic HTML, whereas COinS hides citation data encoded as a query string in HTML. However it is possible to encode openurl’s as XML, so there’s still hope I suppose. I want to sketch out what this could look like for the microformat wiki.

Before Tim’s post I’d never even heard of the Standard Format for Downloading Bibliographic Records z39.80. While it’s only a draft it’s used by Gale for providing downloadable citations, can be imported by RefWorks and most likely others. It bears a lot of resemblance to other citation formats that I’ve come across, but is obviously pre XML. The microformats brain storming that Brian has done has centered around DublinCore, BibTeX, MODS. At the moment I’m thinking BibTeX, Z39.80 and OpenURL stand the best chance of working. Honestly I think we could debate formats till the cows come home (and have left their cow paths ;-), but what microformats needs is some workable solution like semantic-html for OpenURL or Z39.80 and get some examples out there ane people using it while there’s momentum. It feels like there’s a swell here and a wave to ride.

quite a patch

Thursday, October 13th, 2005

Since starting to use lucene heavily at work about a year ago I’ve been watching the lucene list out of the corner of my eye for tips and tricks. Today I saw an email go by that referenced a recent patch that lazily creates SegmentMergeInfo.docMap objects. I guess the point isn’t so much what the object is, but the mere change in lazily creating the object yielded some pretty impressive performance gains:

Performance Results: A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document. Performance Before Patch: indexing time = 121,656 ms querying time = 58,812 ms Performance After Patch: indexing time = 121,000 ms querying time = 598 ms A 100 fold increase in query performance!

Umm, 100 fold increase in performance. That’s quite a patch!

delicious json

Wednesday, September 21st, 2005

I just noticed over on the del.icio.us blog that my data is available as JavaScript Object Notation (JSON) by following a simple URL like: http://del.icio.us/feeds/json/inkdroid.

Essentially you just load the URL as javascript source in your HTML:


<script type="text/javascript" src="http://del.icio.us/feeds/json/inkdroid?count=20"></script>

and voila you’ve magically got a new javascript array variable Delicious.posts, each element of which is a hash describing your link on delicious. It’s a very elegant (and simple) technique…much more elegant than that taken in the XML::RSS::JavaScript module which I helped create. It’s so elegant in fact that I got it working off to the side of this page in 2 minutes. I downloaded the python and ruby extensions for working with JSON just to take a look. The python version is a pleasant read, especially the unittests! The ruby version is a lesson in minimalism:

jsonobj = eval(json.gsub(/(["'])\s*:\s*(['"0-9tfn\[{])/){"#{$1}=>#{$2}"})

Now, if I were to use this I’d probably put a wrapper around it :-) Although it’s less minimalistic I think I prefer the explicitness of the python code. I’ve been digging into Ruby a bit more lately as I work on ruby-marc, and while I’m really enjoying the language I tend to shy away from one line regex hacks like this…which more often than not turn out to be a pain to extend and maintain.

I first heard of JSON from Mike Rylander at the open-ils project who are using JSON heavily in the opensource library catalog that they are developing for the state of Georgia. It is nice to see library technologists leading the curve.

Open Documents

Tuesday, September 6th, 2005

Have you ever had trouble importing one type of word processing document into your current word processor? Perhaps you’re using the same word processor, but are trying to import a document you created with an earlier version. Imagine for a moment what this will mean for a historian who is trying to research some correspondence fifty years from now. How about five hundred years from now? Are historians going to have to be computer hackers who have superhuman reverse engineering talents? Will there be mystical emulators that let you convert your modern computer into an insanely slow Pentium processor CPU running Windows 95 and Word 7? How will you even know what format a document is in?

There’s been some inspiring developments in Massachusetts who have decided to use the OpenDocument Format instead of Mircrosoft’s OpenXML. David Wheeler does a really nice job of summarizing what this means for open source development, and how Microsoft can choose to recover. I had no idea (but was not surprised to learn) that the royalty free license that Microsoft is using to distribute its “open” document format is incompatable with the popular GNU open source license. Ironically this seems to have been a calculated move by Microsoft to exclude open source developers from working with the open formats. Isn’t the whole point of an open document format to be open?

Thank goodness the folks in Massachusetts are on the ball, asking the right questions, and not simply following the money, power and status quo. At the same time they’re not exclusively endorsing the GPL; but have wisely decided to include as many possible development environments as possible. This sounds like the best way to making a truly archivable document format that will be good for “long haul institutions” like libraries and archives. Hopefully other public organizations will consider taking a similiar approach. Thanks Bruce for writing about this.