xbib in subversion

Bruce D’Arcus has started putting his XBib project into subversion at sourceforge. The code is going to include ruby, python and styling libraries. I didn’t realize that sourceforge was offering up svn now…which is a welcome change. I want to set aside some time to get familiar with Bruce’s model and code.

a new type of journal

In the unlikely event that you haven’t seen it there is an interesting thread over on the code4lib discussion list about establishing a code4lib journal. I think Mark Jordan has the right idea:

…would creating a section at http://code4lib.org/ that was reserved for formal, maybe even peer-reviewed articles do what you’re describing? The articles would be the starting point, but the Web 1.9-compliant features that are already appearing on the site (comments, attachments, microformat links, etc.) may satisfy what you’re describing. Heck, maybe we could write a module for http://code4lib.org/ that would pull some of these things together (drupal already has a publishing module). In other words, http://code4lib.org/ could be the journal but it could be a new type of journal.

Dan followed up with a +1 and I think he is right. The drupal instance running on code4lib.org was thrown together at the last minute and rejiggered by lots of people to serve as a place to put conference information. I’ve been wondering what might be in the cards for the site as we move post-conference and I think this “new kind of journal” idea might be where it can go. While there are lots of people with administrator access via the web, there’s not many people with shell access. I’d like to get where mjordan and others can have shell access (if they want it) so that we can make hardcore changes if necessary. Perhaps we just need another plugin, and we can go to town…or as Ross says

It still sounds like there’d still need to be /a/ process (and we need
to work that out), but the overhead is very low.

And I like that.

I like it too.

code4lib days 2-3

So I didn’t have time to journal about the 2nd and 3rd days of the conference since there was so much good stuff going on. I’m on the plane back to Chicago now so I’ve got a few moments to jot down some notes about those days and some general thoughts about the conference.

To be honest the 2nd and 3rd days kind of blur together for me because I really didn’t get much sleep between them. I was pretty much blown away by the variety and quality of the presentations. Thom provided a detailed look at how he builds nimble, high-powered applications using short n’sweet python code on a beowulf cluster using techniques like map-reduce.

While they did separate talks on different topics I found some common strands between Devon Smith’s talk about metadata processing and Rob Sanderson’s talk about indexing in Cheshire3. Both of them had interesting workflows which they illustrated with neat diagrams which I should be able to link to from here soon. It wasn’t UML or anything boring like that. Rob’s illustration was more an overlayed animation over a bunch of slides showing the full lifecycle of a document being busted apart, indexed, a query coming in, triggering retrieval and then reconstitution of the document. Devon used interesting shaped objects to represent components in his metadata management framework. It was so much more fun than a dry description of what the software was doing, and really evoked what’s so much fun about building software–metaphor creation and architecture. Similarly Colleen Whitney of the California Digital Library had some really neat ways of visualizing search results which I wish I could link to as well.

Ryan Chute’s talk about the aDORe archiving framework from Los Alamos was interesting, but it largely seemed like a verbalization of the series of articles about aDORe that have been published. Don’t get me wrong, it’s fascinating stuff, and perhaps I just had super-high expectations–but I was hoping to hear more details of how they are actually using the aDORe framework at Los Alamos. It was good to hear Aaron Krowne talk about his experiments with quality metrics at Emory–especially after hearing a bit about it months ago in IRC. It turns out he was able to layer his new metrics over lucene without having to dive into the lucene code itself. I’m looking forward to seeing the code once it is released. I knew Aaron was a smart dude from talking to him IRC, but was surprised to see he is a confident and articulate public speaker as well.

Of course Roy Tennant is so at home at public speaking he was probably the only person that could easily tackle the “future of code4lib” in a presentation. He talked for 20 minutes about a variety of options that could be in the cards to make code4lib into a more formal organization; and then afterwards he did a breakout session on the topic. Unfortunately I wasn’t able to attend this because I was sitting in on Ross’s openurl Ruby library discussion. I heard that the basic consensus at the end was that things will stay much as they are now, but there might be a niche for code4lib to provide educational training for libraries. I think this idea came from Dan, and I think it’s a great idea. Hopefully we’ll get a chance to discuss over the coming months.

One of the neatest things I witnessed was Ross Singer spontaneously suggesting a breakout session about designing an openurl library for ruby…and something like 20 people showed up. Not that just any 20 people were there: we had Jeff Young (who wrote OCLCs openurl library), Eric Hellman (who helped write the openurl spec and who just sold his company to OCLC), Todd Holbrook (the software developer behind CUFTS) and Jay (?) one of the software developers beind ExLibris’ SFX product. We had a good discussion, which Ross was able to fascilitate, and I think we came away with some good ideas on how to improve the existing library, and perhaps think about providing a common DOMlike api for openurl implementations.

I could go on and on. Like how great the lightning talks were…for example Terry Reese’s five minute laid back demo of his MARCEdit software that was so polished and amazing I couldn’t believe it. It can query z39.50/SRU targets, and crosswalk to MODs and other metadata formats. Casey Bisson finished the conference on the right note encouraging library software developers to get involved in the technology world outside libraries and to look outwards for cowpaths to pave rather than navel gazing and using only standards developed by libraries. I think he definitely has a point, and that the converse is also true–we should be promoting library standards such as sru/cql in the outside world and encouraging them to pave some of our cowpaths. I was hoping to follow Casey’s talk with my lightning talk about microformats but alas we ran out of time.

All in all I had a great time, and got a chance to meet some really interesting folks (some of whom I got to hang out in Portland with afterwards: Gabriel, Devon, Rob, Aaron). I don’t think it would’ve been possible without the support of people like Art Rhyno, Roy Tennant, Dan Chudnov and of course Jeremy Frumkin who managed to make it just happen. The most important feature of the conference was the size, which was big enough to make it interesting, but small enough to make it easily experienced as a whole, and relaxed enough to be fun. I think that it’s pretty clear that it hit a sweet spot, and that it is highly likely that it will happen again.

code4lib day 1

So the first day of “the conference” was a lot of fun. It is just great to see all these people who care about the same stuff in the same place. The lightning talks and the breakout sessions built in some breathing space between the presentations which worked pretty well I thought. Memorable moments for me included:

  • hearing people in the audience shout “OPA!” like we were in Greek Town during Dan’s “Connecting Everything to Everything” talk.
  • being able to ask Jeff Young to do a lightning talk about Info URIs and then hear him do it later. (jyoung++)
  • picking Rob Sanderson’s brain during break about the fine details of CQL.
  • having beers with tholbroo and calvinm at the crowbar
  • being able to ask Eric Hellman about the guts of openly’s data collection efforts.
  • chilling at jaf’s comfy house in the hills of corvallis

off to corvallis

So tomorrow I’m headed for Corvallis, Oregon to attend the first ever code4lib conference. It’s been amazing to watch this conference start as a glimmer in the eye of a handful of people in IRC and turn into a real event attended by 80 library technologists from all over the place.

I’m planning on doing a lightning talk or two, and had spent some time preparing some slides which I tossed in the end. I’m going to talk about using eclipse, microformats and object-relational-mapping – hopefully by just doing some live coding. We’ll see how it goes.

I plan on taking some copious notes, so keep your eye on planet.code4lib.org for the play by play.

WeibelNumber 3

So I have succumbed to infection (thanks Ross), but I’m not entirely sure how this is supposed to work…and Kesa said “Don’t Do It!”. After reading Morbus rail about it I’m almost afraid that I won’t be able to lurk on #swhack anymore if I do spread this any further. But it’s already gutted the interwebs pretty well by now, so here it goes anyhow. I’m just pleased to have a WeibelNumber of 3.

Jobs I’ve had:
- pizza delivery man
- food coop worker
- used bookstore employee
- newspaper kiosk attendant

Movies I can watch over and over:
- Brazil
- Finding Nemo
- Eternal Sunshine of the Spotless Mind
- Lord of the Rings

TV shows I love to watch:
- Jim Lehrer News Hour
- Mystery
- Nova
- Frontline

Places I have lived:
- Princeton Jct, NJ
- New York, New York
- Brighton, England
- Urbana, IL

Places I have been on holiday:
- Cinque Terre
- Montreal
- Kalymnos
- Amsterdam

4 of my favorite dishes:
- Italian Wedding Soup
- Any kind of curry
- Cracklin’ Oat Bran
- Oranges in winter

4 books (just 4 that I’m currently reading–favorite pshaw!)
- Agile Web Development with Rails
- A Mathematical Mystery Tour: Discovering the Truth and Beauty of the Cosmos
- The Algebraist
- Data Mining

Websites I visit daily:
- gmail
- google
- delicious
- unalog

Places I would rather be right now:
- at home with kesa and chloe
- on vacation
- outside of the United States of America
- outside of the United States of America

Bloggers I am tagging (and hope that they miss this entry):
- Brian Cassidy
- Jason Gessner
- Mark Jordan
- Jeff Barry

testing, testing, is this thing on?

Marcel Molina on testing in Rails

Begins: Mon, 06 Feb 2006 at 6:30 PM

Ends: Mon, 06 Feb 2006 at 10:00 PM



651 W. Washington Blvd

Chicago, IL 60661


Link: meetup page

I'm looking forward to attending this talk by Marcel Molina of 37Signals on testing in Rails. One of the things that has impressed me the most about Rails so far is how test stubs are automatically written out for you when you generate classes. I also really like how fixtures can load test data for each test...and the custom assertions rock. Anyhow, hopefully I will find the time to attend this the week before I head out to Oregon.

ical and outlook

I got to talking to Brian Suda about why his hCalendar extracting application x2v works like a dream with iCal but doesn’t seem to work with Microsoft Outlook 2002.

vCalendar/iCalendar Import failed. The input file may be corrupt.

Here’s the event that Outlook doesn’t like, but iCal does:

PRODID:-//suda.co.uk//X2V 0.6.7 (BETA)//EN
X-ORIGINAL-URL: http://www.code4lib.org/
SUMMARY;LANGUAGE=en:The Portland Jazz Festival

After quite a bit of experimentation we determined that Outlook demands that the METHOD, UID and DTSTAMP fields be defined.

PRODID:-//suda.co.uk//X2V 0.6.7 (BETA)//EN
X-ORIGINAL-URL: http://www.code4lib.org/
SUMMARY;LANGUAGE=en:The Portland Jazz Festival

Just thought I’d mention it in here in case someone ends up googling for that error. Brian said he’s going to look into providing this support for those of us who have Microsoft Outlook inflicted on us.

openurl as microformat

The Search

Author: John Battelle

Year: 2005

Publisher: Portfolio Hardcover

ISBN: 1591840880

Ok, so The Search is a great book so far...but I'm really just testing some local modifications I made to the structured blogging tool to use Book OpenURL KEV parameter names as a microformat. Take a look in the HTML and you should see them hiding there. Here's a somewhat prettified version as an image since I couldn't get my syntax highlighter plugin to do a nice enough job with the HTML. Pretty simple stuff right? Notice the COinS in there too? That's thanks to Dan's hacking at structured blogging. Actually getting openurl KEV support into structured blogging is another idea of Dan's. Go Chudnov. Update 01/19/2006 09:39 CST: Dan got similar support for journal articles. If this stuff caught on it could really revolutionize academic blogging...and more.


Thanks to a ping from Dan I just finished listening to an inteview with David Sivers of CDBaby over on Venture Voice. Sivers talks about how being a musician and working briefly in the music industry informed his decision to build CDBaby.

The interview spans a ton of subjects from what’s wrong with the music industry and what to do about it; how being in a circus informs your business acumen; why rejecting venture capital can help you focus on what really matters; and how a lot of heart and hubris can get you really far. Really refreshing (and funny) stuff.

CDBaby is now the single largest digital catalog in the world, and provides a gateway for independent musicians into distribution chains like iTunes. They started with three people sharing a 56k modem in Woodstock, NY. What a fun story.