code4lib conference shaping up

The votes are in, and the a tentative schedule is up. There were a remarkable amount of wonderful presentation ideas submitted, and unfortunately there wasn’t the time/space for all of them. Fortunately there will be lightning talks and breakout sessions that will hopefully pick up some of the slack.

The talks were voted on by anyone who planned on attending. That’s right anyone. This was like a breath of fresh air for me. The voting mechanism was a genius javascript hack at the 11th hour by Ross Singer which allowed drupal users on code4lib.org to annotate a backpack page, which stored results in a database at gatech.edu. We even hooked up our resident bot in #code4lib to be able to talk to the database and get up to the minute polling results.

Anyhow, things are looking really good for the conference. If you were waiting for the presentation to firm up before registering take a look at the schedule. And if you need anymore convincing checkout Lorcan Dempsey’s blog which says it all.


good fences and the frankenweb

Ian Bicking has some interesting notes about competing web development technologies–mainly in response to some posts from Ivan Krystic. The discussion is definitely recommended, especially if you find yourself looking at web application frameworks for Python and Ruby. I found the pivot point of the discussion to be around a new term (for me) – the “frankenweb”.

My understanding is that like Frankenstein (a being created by stitching together random body parts from dead humans) the frankenweb is an unholy mixture of MVC components pulled from different projects, when put together result in an ugly partially functional whole. I think this characterization of Ian’s work is really unpleasant, but strangely compelling. I think that this is mainly because of Ian’s response:

The “Frankenweb” is a feature, and it describes the web we have, the software we have, and the future that is inevitable. The world was never all J2EE, or ASP(.NET), or PHP, and it won’t be all Rails either.

I think Ian is right on about this: “frankenweb” does describe the web we have, and hopefully the web we will continue to have–and the degree to which we can all interoperate is the degree to which the web will succeed. Perhaps I’m seeing the frankenweb through Weinberger-Colored-Glasses, having just finished Small Pieces Loosely Joined (which I thoroughly enjoyed and plan to write about later if there is time). Weinberger does an excellent job of distilling the essence of the web, and how its architecture enabled it to pull itself up from it’s own bootstraps, grow and adapt:

In the real world, I can’t just put in a door from my apartment to my neighbor’s so that anyone can go through. But that’s exactly how the web was built. Tim Berners-Lee orginally created the web so that scientists could link to the work of other scientists without having to ask their permission. If I put a page into the public Web, you can link to it without having to ask to do anything special, without asking me if it’s alright with me, and without even letting me know that you’ve done it…The web couldn’t have been built if everyone had to ask permission first.

Of course I’m conflating links between pages, and API links between software components…but what Ian says about embracing the frankenweb seems to resonate with this somehow.

It’s also quite disorienting to hear Ivan and others lauding tight coupling:

You don’t see the Ruby on Rails guys modularizing Rails to the point of pain. You see them delivering a single, high-polish, tightly coupled product that does its job well.

Given the various pluggable modules that make up Rails I think “tightly coupled” is largely an overstatement. Granted they are available in the same code base, and I haven’t tried to use one of them in isolation–but I imagine it could be done if someone wanted to say, use a activerecord model in a script or something. The Pragmatic Programmer has a really nice chapter on decoupling, and the authors are actually heavily involved in the Ruby/Rails community. The chapter starts out with a nice quote from Robert Frost’s poem The Mending Wall:

Good fences make good neighbors.

It seems to me that Ian is doing the hard work of patching some of these fences, and building a few and deserves a lot of credit for the effort and cat herding.


Fear Itself

I’m glad I’m not the only one who was immediately reminded of this when the NSA spying story broke.

If you are interested in the perspective of a computer security specialist definitely take a look at what Bruce Scheneier has been writing. Schneier’s theory on why Bush needed to bypass the Foreign Intelligence Security Court is pretty harrowing.

The NSA’s ability to eavesdrop on communications is exemplified by a technological capability called Echelon. Echelon is the world’s largest information “vacuum cleaner,” sucking up a staggering amount of voice, fax, and data communications – satellite, microwave, fiber-optic, cellular and everything else – from all over the world: an estimated 3 billion communications per day. These communications are then processed through sophisticated data-mining technologies, which look for simple phrases like “assassinate the president” as well as more complicated communications patterns.

Supposedly Echelon only covers communications outside of the United States. Although there is no evidence that the Bush administration has employed Echelon to monitor communications to and from the U.S., this surveillance capability is probably exactly what the president wanted and may explain why the administration sought to bypass the FISA process of acquiring a warrant for searches.

Honestly, this kind of behavior from the Bush Administration isn’t at all surprising given their “go it alone” attitude. However I’m really dissapointed that the ranking members of the House and Senate Intelligence Committees didn’t make noise–any noise. I imagine they are bound by some oath or whatnot…but what good are checks and balances if they don’t work properly?

Indeed, a recent article from the NYTimes indicates that Schenier’s theory may in fact be, umm fact:

The National Security Agency has traced and analyzed large volumes of telephone and Internet communications flowing into and out of the United States as part of the eavesdropping program that President Bush approved after the Sept. 11, 2001, attacks to hunt for evidence of terrorist activity, according to current and former government officials.

The volume of information harvested from telecommunication data and voice networks, without court-approved warrants, is much larger than the White House has acknowledged, the officials said. It was collected by tapping directly into some of the American telecommunication system’s main arteries, they said.

As part of the program approved by President Bush for domestic surveillance without warrants, the N.S.A. has gained the cooperation of American telecommunications companies to obtain backdoor access to streams of domestic and international communications, the officials said.

I’m really worried that we’re not teetering on a slippery slope but are actually in free fall. It appears that telecommunications companies are helping feed data mining operations at the NSA in real time. Perhaps they have a googlish front end where ‘professionals’ can type in ‘keywords’ and hit “I’m feeling lucky” and get a list of phone conversations or emails.

The Bush Administration’s prolific use of “fear” as an policy wedge is extremely dangerous. As Roosevelt famously said in a time of national crisis:

So, first of all, let me assert my firm belief that the only thing we have to fear is fear itself: nameless, unreasoning, unjustified terror which paralyzes needed efforts to convert retreat into advance.

On a somewhat lighter note, Schneier linked to a little trick devised by Richard M. Smith which allows you to detect if the NSA is monitoring your email communications. As my friend Ed Silva pointed out in IM:

I wouldn’t try it if you are planning on flying.

Uh, yeah I was planning on going to code4lib 2006 in a few months….maybe I’ll wait.



opensearch and autodiscovery

I just noticed that a9 has released a second draft of opensearch v1.1. This draft includes details on opensearch autodiscovery for providing a reference to the opensearch description file in an HTML page. This could have a lot of potential for browser plugins. Also, they’ve added a Query element that can be used for echoing back the query that was used to generate results…kinda like the echoedRequest in SRU. These are the things that popped out at me. Of course the big news in the first draft was that Atom can now be used in responses.

At any rate it was nice to see that they link to my opensearch python library from their tools page. Once 1.1 moves from draft I’m going to work on upgrading it from 1.0 right away.


lib4code

Thanks to Jessamyn I found Librarians 2.0 don’t need to be coders 2.0 where Richard Ackerman has an interesting take on just how important programming skills are for a library technologist. Richard cites a paper from IBM on Service Oriented Architectures to make a compelling point that there are many roles to play when building technology solutions, particularly the web services that comprise such a big part of “library 2.0” efforts…and that coding skills really aren’t that important when you can just get a student, consultant, or vendor (heh) to do it.

It’s unfortunate I think that the code4lib 2006 conference name seems to emphasize “coding” so much over the ideas. I totally agree that the most important aspect of our work as library technologists are the service ideas, and that the code is simply a machine readable description of these ideas. Some high level languages are actually really, really nice for expressing ideas, and I would argue that often times learning a good computer language can help you express your technology ideas better. As Martin Fowler says:

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

Let me go on record, as someone who has helped organize the code4lib conference, to say non-coders are more than welcome…there will be plenty of people who can program there…we want ideas, mindshare and collaboration. Please don’t let the computer programming jargon dissuade you from participating.

Also, a few things stood out to me eye:

If your goals and architecture are clear enough, the coders don’t need to be library experts in order to deliver the functions you need.

The coders don’t need to be experts, but wouldn’t it be nice if they were, and you didn’t have to go into great detail about certain things? Wouldn’t it also be nice if the coders didn’t start from scratch and were aware of good reusable components from the library software community which could be leveraged to make the software construction phase that much faster? Indeed, architectural decisions often have a direct effect on programming decisions that are made, and it helps if those who are architecting things have at least a general understanding of how software is built so that designs stay doable and sane…and so that they’ll know when things are drifting of course.

Also, don’t try to build big complex systems. Live in the beta world. Get some chunk of functionality out quickly so that people can play with it. The hardest part is having the initial idea, and the good news is I see lots of great ideas out in the library blogosphere. I can understand the frustrations in the gap between the idea and running code, but I hope I’ve presented a bunch of areas above in which you can work to turn the idea into the next hot beta, without necessarily needing to code it yourself.

The one danger to moving to a formal process like the one described in the article by IBM is that it may encourage you to build big complex systems on a slow time scale. If you need to thoroughly describe a software solution before beginning to program (the so called waterfall model) you will be spend a lot of time trying to get the design right before even beginning to code to see what actually works. I’ve found the more that the design and the coding can be intermingled the better…since it lets them both inform each other as they go on. This intermingling is easy if you are a small shop and you have a handful of people (1-8) that need to communicate on a regular basis. I imagine most software development groups in libraries are around this size. That being said I think that Richard is right, it’s good to be aware of the different roles that are being played, perhaps by one individual.

After seeing Adrian talk about software development and journalism at Snakes and Rubies I’ve been thinking off and on about the space between libraries and software development. I’m particularly interested in how one informs the other…and found Richard’s post to be a good catalyst.


selenium/ruby sprint

In case you missed it, or aren’t subscribed the chicago-ruby group is getting together for a Selenium demo/sprint session on Dec 13th. I’ve seen Selenium demo’d before at a chipy meeting, and look forward to a more in depth look since several of my friends really like this testing tool. I think Jason is particularly interested in getting some Ruby driver support.


gmail + atom

I imagine this is old hat to long time gmail users but I just noticed that my gmail is available via atom…and very easily at that with Mark Pilgrim’s Universal Feed Parser.

    import feedparser

    feed = feedparser.parse(
        'https://username:password' + 
        '@gmail.google.com/gmail/feed/atom')

    for entry in feed.entries:
        print entry.title


Washington Post U.S. Congress Votes Database

Umm, wow! Adrian Holovaty announced the Washington Post Congressional Votes Database. This site is of important for at least two reasons:

  • it offers RSS feeds for tracking the voting of your house/senate representative
  • it is powered by Python and the Django web framework.

As far as the RSS goes I just pulled up Dick Durbin’s recent votes and there were over 20 events since 11/17/2005, whereas the comparable service from GovTrack had only one event since then.

After the election I daydreamed about somehow getting involved in the political process in a technical way…which is how I found my way to GovTrack, who are essentially doing very elaborate screen scraping of the Thomas database at the Library of Congress. One thing I really like about the GovTrack project is they are making their data available as RDF, for downstream applications. Adrian’s work seems to draw on a richer data source, as I imagine is the case at a place like the Post. All I can say is well done, and damn…you’ve only been there for a couple months right? Talk about hitting the ground running.

At the recent Snakes and Rubies Adrian indicated that there was going to be some huge Django related news. When the voting db hit my Instant Messenger, IRC client and RSS aggregator I thought that this was it. But according to Adrian there’s something bigger in the works…


snakes and rubies

I managed to attend Snakes and Rubies yesterday where Adrian Holovaty and David Heinemeier Hansson talked about their respective web frameworks: Django and Rails.

The event started at 2PM and went to 6:30PM or so, and was attended by over 100 people! I watched this little event take shape out of the mists of the local python and ruby mailing lists and was just amazed to see how vibrant the Chicago software development scene is, or has become in the 3 years I’ve lived in the area.

What’s more Adrian and David did a great job promoting both of their projects, while remaining amiable and respectful of the other camp. It’s hard to imagine a similar event between two commercial frameworks. Both were given about 45 minutes or so to talk about their software in any way they wanted. They both had extremely different yet effective presentation styles, and their projects had one important thing in common: disillusionment with PHP.

Rather than talking technical details Adrian spent most of his time focused on how Django came to be down in Kansas at lawrence.com. lawrence.com began it’s life as a PHP application which served as a community site for all sorts of goings on in Lawrence, Kansas. The site interleaved all sorts of local entertainment content: music, dining, art, movies…and it encouraged user participation. For example you can listen to mp3s from local musicians, but here’s the twist, you listen to them as they are playing in town…so if you like a song you can jump over to the venue later that week to see them live. Another example was a full on sports site for the local little leagues which posted details of games, scores, weather conditions, etc. All of this was detailed to show how deeply intertwined all the data was.

The really interesting stuff for me was when Adrian described how journalism informed his software development practices…and how Django fed into this process. In the same way that journalists work against the clock to get news stories out quickly and accurately Adrian and his team worked to get software projects done often on similar deadlines (sometimes like 4 hours). They quickly found that their PHP infrastructure wasn’t allowing them to respond quickly enough without introducing side effects, and decided that they needed new tools…which is how Django was born. In fact the little league application mentioned above was the first Django application.

Adrian has since moved on to the Washington Post, where he is their resident web technology mad scientist. Apparently they are using Django in some form at the Post, or are planning to since he mentioned Django’s caching can scale to the 9 million odd requests the Post gets in a single day.

Unfortunately my lead pencil ran out of lead just a bit of the way into David’s talk, so I don’t have as much written down from the Rails presentation. David dropped some wonderful one liners that I wish I could have written down. Much unlike Adrian, David let actual code do most of the talking for him.

Early on he had a screen with a quote from Richard Feynman on the importance of finding beautiful solutions to problems (if you remember the quote please let me know). This quote kind of guided the rest of the talk where David showed off beautiful Model, View and Controller code from RubyOnRails…and it really was beautiful stuff. David’s thesis was that beautiful things make you happy, and happiness makes you more productive…so beautiful code will make for happy, productive programmers. Much of this comes back to the essential philosophy of Ruby–to give joy to programmers. At any rate, the lights were dimmed and David gave us a tour of what RubyOnRails code looks like, while highlighting some of the strengths of the project and the Ruby language. On one of the pages there was some code to set a cookie expiration, and the date was created like so:

20.years.from_now

How cool is that! I wasn’t sure if this was part of Ruby proper until I fired up my ruby interpreter to check:

biblio:~ ed$ irb
irb(main):001:0> 20.years.from_now
NoMethodError: undefined method `years' for 20:Fixnum
        from (irb):1   

Whereas from the Rails console it works fine:

biblio:~/Projects/cheap ed$ script/console 
Loading development environment.
>> 20.years.from_now
=> Thu Dec 04 15:38:48 CST 2025 

So Rails decorates the Fixnum class with the years method. Pretty awesome :-) Another thing David highlighted was that Ruby is used everywhere, from configuration, to writing XML, to writing JavaScript. I was even surprised to hear him argue for full on Ruby in view templates. His argument is that even when a framework offers only a limited set of tags, it’s still offering logic, and rather than creating some bastardized tag language why not just use tried and true Ruby.

The two presentations were followed by a few (not many) moderated questions, and some questions from the audience. The highlight for me was when Why the Lucky Stiff’s question was asked:

Looking a bit beyond web frameworks, how do you envision the world coming to an end? David responded by “scoping” the world to mean the world of software development and said that this world would come to an end if the layers of Java “sedimentation” continue to accrue. He went on to predict that we’re at a crossroads in software development, and that a paradigm shift is underway…intentionally provocative, and pretty much right on as far as web development goes if you ask me. Adrian responded “Yoko Ono”.

So, as you can tell I’m still digesting the presentations and discussion. There was so much good stuff, and I was really struck by the collegiality between the two guys: open source software development at its finest. The two main things I took away were embracing the boundaries between software development and a particular industry like Journalism, or in my case Libraries; and always trying to strive for the beautiful in software, “boiling down” a thorny problem into its most simple and elegant expression.