selenium/ruby sprint

In case you missed it, or aren’t subscribed the chicago-ruby group is getting together for a Selenium demo/sprint session on Dec 13th. I’ve seen Selenium demo’d before at a chipy meeting, and look forward to a more in depth look since several of my friends really like this testing tool. I think Jason is particularly interested in getting some Ruby driver support.

gmail + atom

I imagine this is old hat to long time gmail users but I just noticed that my gmail is available via atom…and very easily at that with Mark Pilgrim’s Universal Feed Parser.

    import feedparser

    feed = feedparser.parse(
        'https://username:password' + 

    for entry in feed.entries:
        print entry.title

Washington Post U.S. Congress Votes Database

Umm, wow! Adrian Holovaty announced the Washington Post Congressional Votes Database. This site is of important for at least two reasons:

  • it offers RSS feeds for tracking the voting of your house/senate representative
  • it is powered by Python and the Django web framework.

As far as the RSS goes I just pulled up Dick Durbin’s recent votes and there were over 20 events since 11/17/2005, whereas the comparable service from GovTrack had only one event since then.

After the election I daydreamed about somehow getting involved in the political process in a technical way…which is how I found my way to GovTrack, who are essentially doing very elaborate screen scraping of the Thomas database at the Library of Congress. One thing I really like about the GovTrack project is they are making their data available as RDF, for downstream applications. Adrian’s work seems to draw on a richer data source, as I imagine is the case at a place like the Post. All I can say is well done, and damn…you’ve only been there for a couple months right? Talk about hitting the ground running.

At the recent Snakes and Rubies Adrian indicated that there was going to be some huge Django related news. When the voting db hit my Instant Messenger, IRC client and RSS aggregator I thought that this was it. But according to Adrian there’s something bigger in the works…

snakes and rubies

I managed to attend Snakes and Rubies yesterday where Adrian Holovaty and David Heinemeier Hansson talked about their respective web frameworks: Django and Rails.

The event started at 2PM and went to 6:30PM or so, and was attended by over 100 people! I watched this little event take shape out of the mists of the local python and ruby mailing lists and was just amazed to see how vibrant the Chicago software development scene is, or has become in the 3 years I’ve lived in the area.

What’s more Adrian and David did a great job promoting both of their projects, while remaining amiable and respectful of the other camp. It’s hard to imagine a similar event between two commercial frameworks. Both were given about 45 minutes or so to talk about their software in any way they wanted. They both had extremely different yet effective presentation styles, and their projects had one important thing in common: disillusionment with PHP.

Rather than talking technical details Adrian spent most of his time focused on how Django came to be down in Kansas at began it’s life as a PHP application which served as a community site for all sorts of goings on in Lawrence, Kansas. The site interleaved all sorts of local entertainment content: music, dining, art, movies…and it encouraged user participation. For example you can listen to mp3s from local musicians, but here’s the twist, you listen to them as they are playing in town…so if you like a song you can jump over to the venue later that week to see them live. Another example was a full on sports site for the local little leagues which posted details of games, scores, weather conditions, etc. All of this was detailed to show how deeply intertwined all the data was.

The really interesting stuff for me was when Adrian described how journalism informed his software development practices…and how Django fed into this process. In the same way that journalists work against the clock to get news stories out quickly and accurately Adrian and his team worked to get software projects done often on similar deadlines (sometimes like 4 hours). They quickly found that their PHP infrastructure wasn’t allowing them to respond quickly enough without introducing side effects, and decided that they needed new tools…which is how Django was born. In fact the little league application mentioned above was the first Django application.

Adrian has since moved on to the Washington Post, where he is their resident web technology mad scientist. Apparently they are using Django in some form at the Post, or are planning to since he mentioned Django’s caching can scale to the 9 million odd requests the Post gets in a single day.

Unfortunately my lead pencil ran out of lead just a bit of the way into David’s talk, so I don’t have as much written down from the Rails presentation. David dropped some wonderful one liners that I wish I could have written down. Much unlike Adrian, David let actual code do most of the talking for him.

Early on he had a screen with a quote from Richard Feynman on the importance of finding beautiful solutions to problems (if you remember the quote please let me know). This quote kind of guided the rest of the talk where David showed off beautiful Model, View and Controller code from RubyOnRails…and it really was beautiful stuff. David’s thesis was that beautiful things make you happy, and happiness makes you more productive…so beautiful code will make for happy, productive programmers. Much of this comes back to the essential philosophy of Ruby–to give joy to programmers. At any rate, the lights were dimmed and David gave us a tour of what RubyOnRails code looks like, while highlighting some of the strengths of the project and the Ruby language. On one of the pages there was some code to set a cookie expiration, and the date was created like so:


How cool is that! I wasn’t sure if this was part of Ruby proper until I fired up my ruby interpreter to check:

biblio:~ ed$ irb
irb(main):001:0> 20.years.from_now
NoMethodError: undefined method `years' for 20:Fixnum
        from (irb):1   

Whereas from the Rails console it works fine:

biblio:~/Projects/cheap ed$ script/console 
Loading development environment.
>> 20.years.from_now
=> Thu Dec 04 15:38:48 CST 2025 

So Rails decorates the Fixnum class with the years method. Pretty awesome :-) Another thing David highlighted was that Ruby is used everywhere, from configuration, to writing XML, to writing JavaScript. I was even surprised to hear him argue for full on Ruby in view templates. His argument is that even when a framework offers only a limited set of tags, it’s still offering logic, and rather than creating some bastardized tag language why not just use tried and true Ruby.

The two presentations were followed by a few (not many) moderated questions, and some questions from the audience. The highlight for me was when Why the Lucky Stiff’s question was asked:

Looking a bit beyond web frameworks, how do you envision the world coming to an end? David responded by “scoping” the world to mean the world of software development and said that this world would come to an end if the layers of Java “sedimentation” continue to accrue. He went on to predict that we’re at a crossroads in software development, and that a paradigm shift is underway…intentionally provocative, and pretty much right on as far as web development goes if you ask me. Adrian responded “Yoko Ono”.

So, as you can tell I’m still digesting the presentations and discussion. There was so much good stuff, and I was really struck by the collegiality between the two guys: open source software development at its finest. The two main things I took away were embracing the boundaries between software development and a particular industry like Journalism, or in my case Libraries; and always trying to strive for the beautiful in software, “boiling down” a thorny problem into its most simple and elegant expression.

code4lib 2006

Some #code4lib regulars (who also help put on Access up north) have managed to get some space at the Oregon State University in February for code4lib 2006:

code4lib 2006 is a loosely structured conference for library technologists to commune, gather/create/share ideas and software, be inspired, and forge collaborations. It is also an outgrowth of the Access HackFest, wrapped into a conference-ish format. It is the event for technologists building digital libraries and digital information systems, tools, and software.

A call for proposals is out. The nice thing about this conference is that there will be different levels of involvement: from keynote speakers, to shorter presentations, to lightning talks, and with space/time to actually hack at stuff/brainstorm with colleagues.

We’re hoping to attract both library professionals who use computers, and computer professionals who have an interest in libraries. The registration is now open as well at a discounted price. If you are interested in computers and libraries please submit a proposal or register to attend!

BBC Catalogue's Search

I did end up hearing back from Matt Biddulph about the search technology that he’s using with RubyOnRails to build the BBC Programme Catalogue.

The core of the search is nothing more than mysql 4.1’s fulltext indexer. I used to think very poorly of it until I discovered how to turn off its automatic stoplist and minimum indexable word length, and started using its boolean mode. Having the database manage the indexing without having to keep a separate index in sync is very valuable, and of course it’s portable to any client language.

The nice thing with a dataset the size and quality of the BBC’s is that you’re not solely dependent on the quality of the freetext indexer. I’ve done a little statistical analysis on the data to help with scoring the results. For example, programme contributors can be ranked according to how many shows they’ve contributed to, and commonly co-occurring contributors can be easily calculated with a bit of overnight batch processing. This kind of stuff contributes to a pretty good set of search results.

Given the visibility of the BBC Catalogue and that it has nearly a million records this says good things to me about the scalability of MySQL’s fulltext search. I’ll definitely consider it along with Ferret for Rails experiments that need search functionality.

buddhism and spimes

The Dalai Lama has an op-ed on science and faith in yesterdays New York Times. There are some delightful descriptions of his encounters with science as a child, which I imagine are excerpts from his recent book. I also like how he intertwingles religion and science–not making one higher up in a hierarchy.

If science proves some belief of Buddhism wrong, then Buddhism will have to change. In my view, science and Buddhism share a search for the truth and for understanding reality. By learning from science about aspects of reality where its understanding may be more advanced, I believe that Buddhism enriches its own worldview.

And the converse:

Just as the world of business has been paying renewed attention to ethics, the world of science would benefit from more deeply considering the implications of its own work. Scientists should be more than merely technically adept; they should be mindful of their own motivation and the larger goal of what they do: the betterment of humanity.

The impact of science and our way of life on our environment is something I’ve been reading about in Bruce Sterling’s Shaping Things. I haven’t finished it yet but the essential message so far is that we need to design objects in our environment so that they can reveal information about how they fit into the environment. This information amounts to links to databases that can track the history of the object, how to get customer support, history of ownership, manufacturing origins, internal components, details on customizing and interfacing, etc. Sterling calls these objects spimes and if you are interested his speech at SIGGRAPH has more details.

I’m not entirely sure why I’m mentioning both Spimes, Buddhism and Ted Nelson in the same breath. I suppose all three focus the attention on just how deeply interconnected we all are with each other and with the world around us. Sometimes these interconnections can be overwhelming. Meditating on this inter-connectedness, and building tools to manage the connections responsibly are two worthwhile things I’d like to work on.

jython niceties

While playing around with the Java JDOM library, I found myself resorting to jython to experiment with the API. It’s just so much easier this way for me:

#!/usr/bin/env jython

search @ delicious and the bbc

I just noticed that now has full, fast search across all content (not just your own bookmarks). This is something that Dan’s unalog has had on delicious for a while (apart from the delightful content). Dan uses pylucene as his search engine, which still has some interesting features. It’s pretty wild being able to search across all the delicious content, given their volume.

When delicious was really ramping up I saw the occasional mason error page, so I know that they are (or were) using Perl. This makes me really curious to know what search technology they are using…but I couldn’t find any details in the announcement.

Likewise, the news about the BBC Programme Catalogue being built with RubyOnRails. I’ve really come to appreciate Lucene and PyLucene and am in search of similar search tools for Ruby. I’ve got an email out to Matt Biddulph to see if he can provide any details about the BBC effort.

access2005 presentations

Unfortunately I wasn’t able to make it to Access this year where lots of library developer types I respect and learn from were presenting and hacking. Fortunately the audio and slides are now available. Combined with the collected blogging and snippets in irc I almost feel like I was there…but I imagine the real brain storming and fun happened outside of these artifacts. Inspiring stuff, and highly recommended if you’re into writing software for libraries/archives.