Archive for September, 2006

rsinger++

Thursday, September 28th, 2006

So Ross beat out 11 other projects to win the OCLC Research Software Contest for his next generation OpenURL resolver umlaut. Second place went to to Jesse Andrews’ BookBurro–so the competition was fierce this year. Much more so than last year when there were 4 contestants.

Those of us who hang out in #code4lib got to hear about this project when it was just a glimmer in his eye…and had front row seats for hearing about the development as it progressed. Essentially umlaut is an openurl router that’s able to consult online catalogs (via SRU), other OpenURL resolvers (SFX), Amazon, Google, Yahoo, Connotea, CiteULike and OAI-PMH. It’s all written in Ruby and RubyOnRails.

I feel particularly proud because Ross is enough of a mad genius to have found a use for some ruby gems I wrote for doing sru, oai-pmh and querying OCLC’s xisbn service.

Speaking of which we’ve been collaborating recently on a little ruby gem for querying OCLC’s OpenURL Resolver Registry. This registry essentially makes it easy to determine what the appropriate OpenURL resolver is given a particular IP address. So you could theoretically rewrite your fulltext URLs so that they were geospatially aware. For example:

  require "resolver_registry"
 
  client = ResolverRegistry::Client.new
  institution = client.find('130.207.50.91')
  print institution.resolver.base_address

If you want to take a look direct your svn client like so:

svn co http://rsinger.library.gatech.edu/svn/openurl_registry/

I imagine it’ll get released to rubyforge sometime shortly.

a funny way to make a living

Tuesday, September 26th, 2006

gabe discovered that the code4lib.org drupal instance was littered with comment spam. Someone had actually registered for an account and proceeded to add comments to virtually every story.

Since there was an email address associated with the account I figured I’d send an email letting them know their account was going to be zapped.

From: edsu
To: evgeniy1985@breezein.net
Subject: code4lib.org spam

Excuse me. Why are you posting spam to my drupal
site? Consider your account removed.
//Ed

I really didn’t expect a reply, but sure enough a few hours later:

From: evgeniy1985@breezein.net
To: edsu
Subject: Re: code4lib.org spam

sorry please.i am from russia.my family need a money,
but many money i can do only with spam. sorry for
my english.

sigh

ruby-oai v0.0.3

Tuesday, September 19th, 2006

v0.0.3 of ruby-oai was just released to RubyForge. The big news is that this release allows you to use libxml for parsing thanks to the efforts of Terry Reese. Terry is building a RubyOnRails metasearch application at OSU and, well, felt the need for speed.

After committing the branch he was working on I ran some performance tests of my own. I ran a vanilla ListRecords request against dspace, eprints and american memory oai-pmh servers using both the rexml (default) and libxml backend parsers. Here are the results

server parser real user sys
dspace rexml 0m3.632s 0m2.008s 0m0.044s
libxml 0m1.900s 0m0.212s 0m0.032s
  1.732s (+48%) 1.796s (+89%) 0.012s (+27%)
 
eprints rexml 0m19.807s 0m1.984s 0m0.036s
libxml 0m19.344s 0m0.236s 0m0.024s
  0.463s (+2%) 1.748s (+88%) 0.012s (+33%)
 
american-memory rexml 0m12.991s 0m5.424s 0m0.052s
libxml 0m7.420s 0m0.324s 0m0.032s
  5.571s (+43%) 5.104s (+94%) 0.02s (+38%)

Those percentage values are speed improvements. Thanks Terry :-)

the importance of making packages

Tuesday, September 19th, 2006

If you are interested in such things Ian Bicking has a nice posting about why breaking up a project into smaller packages of functionality is important. His key point is that the boundaries between packages actually help in establishing and maintaining decoupled modules within your application.

…when someone claims their framework is all spiffy and decoupled, but they just don’t care to package it as separate pieces… I become quite suspicious. Packaging doesn’t fix everything. And it can introduce real problems if you split your packages the wrong way. But doing it right is a real sign of a framework that wants to become a library, and that’s a sign of Something I’d Like To Use.

So why is decoupling important? Creating distinct modules of code with prescribed interfaces helps ensure that a change inside one module doesn’t have a huge ripple effect across an entire project codebase. In addition to using packaging to create boundaries between components the Law of Demeter is a pretty handy technique for reducing coupling in object oriented systems. It amounts to ensuring that a given method only invokes methods on objects that are: itself, in its parameters, objects that itself creates, or component objects. The LoD seems to be a good practice at the local level, but packaging helps at a macro/design level. One of the most powerful and fun parts of packaging is coming up with good names and metaphors for your packages and components. Having fun and meaningful names for packages provides coherence to a project, and allows developers to talk about an application. Eric Evans has some nice chapters in his Domain Driven Design about coming up with what he calls a domain language whose aim is to:

To create a supple, knowledge-rich design calls for a versatile, shared team language, and a lively experimentation with language that seldom happens on software projects.

It’s important…and naming distinct packages well helps build a good domain language.

I suppose it’s implicit in making something a code library–but one of the other major benefits of splitting a larger project up into smaller packages is that you encourage reuse. The bit of functionality that you decided to bundle up separately can be used as a dependency in a different project–perhaps even by a different person or organization. This seems to me to be a hallmark of good open source software.

Most popular languages these days have established ways of making packages available, downloadable and installable while expressing the dependencies between them. Perl has CPAN, PHP has PEAR, Ruby has gems and RubyForge, Python has eggs and EasyInstall, Java has maven, Lisp has asdf. Even some applications like Trac, RubyOnRails and Drupal encourage the creation of smaller packages (modules or plugins) by having a well defined api for adding extensions. And that’s not even getting into the various ways operating systems make packages available…

The truly hard part about packaging for me isn’t really technical. Most packaging solutions allow you to manage dependencies, versioning, installation and removal. As Ian says, its the decision of where to draw the lines between packages that is hard. It’s hard because you have to guess before you start coding–and often during the process of coding you realize that the dividing lines between packages begin to blur. This is why having distinct packages is so important because you are forced to stare at the blurriness and encouraged to fix it…instead of creating the infamous big ball of mud.

An interesting counterpoint to trying to figure out the dividing lines before hand is to try to design from the outside in, and extract reusable components from the result. The very successful RubyOnRails web framework was extracted from a working application (Basecamp). In a lot of ways I think Test Driven Design encourages this sort of outside-in thinking as well. Extracting usable components from a ball of mud is nigh impossible though…at least for me. I would be interested to know how much of the Rails components were anticipated by the designers as they were creating BaseCamp. It takes a great deal of discipline and jazz-like anticipation to be able to improvise a good design. That or, you have to build in time to prototype something with an eye to taking what you’ve learned to do it right.

open standards

Friday, September 15th, 2006

Folks who are interested in libraries and technology are often drawn to the issue of open standards. Using open standards is very important to libraries for a variety of reasons that Ed Corrado summarizes nicely.

This week my podcast reader picked up an excellent interview with Danese Cooper of the Open Source Initiative where she talks about the Open Standard Requirement which was introduced a few months ago. It provides a new perspective on the same issue from outside of the library community.

Essentially the OSR amounts to 5 guidelines for identifying a truly open standard. These guidelines are different though because they focus on what makes a standard open for an implementor. Whether the standard was created by an open process or not is really outside of scope. The important thing is how easy it is for a software developer to write software that uses the standard. A nice feature of the OSR is that the guidelines would fit on an index card. Here’s my regurgitation of them:

  1. The spec can’t omit details needed for implementation
  2. The standard needs to be freely/publicly available
  3. All patents involved in the spec need to be royalty free
  4. Clicking through a license agreement is not necessary
  5. The spec can’t be dependent on a standard that is not open as well

Danese was quick to point out that these are simply guidelines and not rules. For example Unicode fails on 2. since you have to pay for a copy of the spec. But in this case printing the standard is a publishing feat–given all the glyphs and their number. It’s not unusual that the book would cost money. So this guideline could be waived if the OSI folks agreed.

Rather than the OSI going and applying these rules to all known standards the idea is that standards bodies could claim self-compliance–and as developers implement the standard the compliance will be ascertained.

The guidelines themselves and the process of being fine tuned/hammered on–and they are looking for volunteers…

mission haiku

Monday, September 11th, 2006

Following the lead of some others:

vacant history
crusted burnt bits lost to time
shards of clarity