Archive for December, 2005

Fear Itself

Wednesday, December 28th, 2005

I’m glad I’m not the only one who was immediately reminded of this when the NSA spying story broke.

If you are interested in the perspective of a computer security specialist definitely take a look at what Bruce Scheneier has been writing. Schneier’s theory on why Bush needed to bypass the Foreign Intelligence Security Court is pretty harrowing.

The NSA’s ability to eavesdrop on communications is exemplified by a technological capability called Echelon. Echelon is the world’s largest information “vacuum cleaner,” sucking up a staggering amount of voice, fax, and data communications — satellite, microwave, fiber-optic, cellular and everything else — from all over the world: an estimated 3 billion communications per day. These communications are then processed through sophisticated data-mining technologies, which look for simple phrases like “assassinate the president” as well as more complicated communications patterns.

Supposedly Echelon only covers communications outside of the United States. Although there is no evidence that the Bush administration has employed Echelon to monitor communications to and from the U.S., this surveillance capability is probably exactly what the president wanted and may explain why the administration sought to bypass the FISA process of acquiring a warrant for searches.

Honestly, this kind of behavior from the Bush Administration isn’t at all surprising given their “go it alone” attitude. However I’m really dissapointed that the ranking members of the House and Senate Intelligence Committees didn’t make noise–any noise. I imagine they are bound by some oath or whatnot…but what good are checks and balances if they don’t work properly?

Indeed, a recent article from the NYTimes indicates that Schenier’s theory may in fact be, umm fact:

The National Security Agency has traced and analyzed large volumes of telephone and Internet communications flowing into and out of the United States as part of the eavesdropping program that President Bush approved after the Sept. 11, 2001, attacks to hunt for evidence of terrorist activity, according to current and former government officials.

The volume of information harvested from telecommunication data and voice networks, without court-approved warrants, is much larger than the White House has acknowledged, the officials said. It was collected by tapping directly into some of the American telecommunication system’s main arteries, they said.

As part of the program approved by President Bush for domestic surveillance without warrants, the N.S.A. has gained the cooperation of American telecommunications companies to obtain backdoor access to streams of domestic and international communications, the officials said.

I’m really worried that we’re not teetering on a slippery slope but are actually in free fall. It appears that telecommunications companies are helping feed data mining operations at the NSA in real time. Perhaps they have a googlish front end where ‘professionals’ can type in ‘keywords’ and hit “I’m feeling lucky” and get a list of phone conversations or emails.

The Bush Administration’s prolific use of “fear” as an policy wedge is extremely dangerous. As Roosevelt famously said in a time of national crisis:

So, first of all, let me assert my firm belief that the only thing we have to fear is fear itself: nameless, unreasoning, unjustified terror which paralyzes needed efforts to convert retreat into advance.

On a somewhat lighter note, Schneier linked to a little trick devised by Richard M. Smith which allows you to detect if the NSA is monitoring your email communications. As my friend Ed Silva pointed out in IM:

I wouldn’t try it if you are planning on flying.

Uh, yeah I was planning on going to code4lib 2006 in a few months….maybe I’ll wait.

on delicious

Thursday, December 15th, 2005

Ari Paparo has some interesting notes on what made delicious succeed where his company (doing essentially the same thing in 1999 with 13 million dollars) failed. Two things that really resonated with me were: defaults matter and folders suck.

Ari’s site blink.com made bookmarks private by default but allowed users to share them–whereas delicious makes them public. This encouraged looking at the bookmarks globally as the purpose of the service, whereas with blink it was an afterthought.

What Ari says about why folders suck was so good I’m going to take the liberty of just quoting a big chunk of it.

We believed that users would not only make their folders public, but also would categorize those folders into a directory structure. We called this the “Public Library” and created a Yahoo-like node structure on which users could post. This could have made sense since categorizing folders would be less work than categorizing individual bookmarks – after all, the folders were already “categories” of a sort.

There were several severe problems with this folder-based approach. First, people are very bad and inconsistent at organizing things. One day etrade.com will go into the “finance” folder and another day it will go into the “favorite links” folder. We were taking this fundamental flaw and squaring it – asking users to use graph their existing categorization onto a second arbitrary structure within the public library. Does my “finance” folder go into the “Business” directory or the “Personal” directory?

Then there was the issue of how deep to go when categorizing folders. If I’ve got a folder of “online brokerages” do I put it in the directory at the level of “Finance” since my folder is in a sense a sub-category of finance, or do I put it within the pre-existing “Finance -> Brokerage” directory? Users were confused, and with good reason.

Even librarians aren’t always that great at categorizing things. We need lots of rules, and don’t always remember to follow all of them. It’s a tricky business :-)

Case in point I don’t even use the tag “taxonomy” consistently (see screenshot above). delicious has a new feature that provides guidance on what tags you’ve used, and how many times — when you are typing in a tag. This is extremely handy for displaying just how inconsistent I am…and how to improve the quality of my tags.

opensearch and autodiscovery

Wednesday, December 14th, 2005

I just noticed that a9 has released a second draft of opensearch v1.1. This draft includes details on opensearch autodiscovery for providing a reference to the opensearch description file in an HTML page. This could have a lot of potential for browser plugins. Also, they’ve added a Query element that can be used for echoing back the query that was used to generate results…kinda like the echoedRequest in SRU. These are the things that popped out at me. Of course the big news in the first draft was that Atom can now be used in responses.

At any rate it was nice to see that they link to my opensearch python library from their tools page. Once 1.1 moves from draft I’m going to work on upgrading it from 1.0 right away.

lib4code

Thursday, December 8th, 2005

Thanks to Jessamyn I found Librarians 2.0 don’t need to be coders 2.0 where Richard Ackerman has an interesting take on just how important programming skills are for a library technologist. Richard cites a paper from IBM on Service Oriented Architectures to make a compelling point that there are many roles to play when building technology solutions, particularly the web services that comprise such a big part of “library 2.0″ efforts…and that coding skills really aren’t that important when you can just get a student, consultant, or vendor (heh) to do it.

It’s unfortunate I think that the code4lib 2006 conference name seems to emphasize “coding” so much over the ideas. I totally agree that the most important aspect of our work as library technologists are the service ideas, and that the code is simply a machine readable description of these ideas. Some high level languages are actually really, really nice for expressing ideas, and I would argue that often times learning a good computer language can help you express your technology ideas better. As Martin Fowler says:

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

Let me go on record, as someone who has helped organize the code4lib conference, to say non-coders are more than welcome…there will be plenty of people who can program there…we want ideas, mindshare and collaboration. Please don’t let the computer programming jargon dissuade you from participating.

Also, a few things stood out to me eye:

If your goals and architecture are clear enough, the coders don’t need to be library experts in order to deliver the functions you need.

The coders don’t need to be experts, but wouldn’t it be nice if they were, and you didn’t have to go into great detail about certain things? Wouldn’t it also be nice if the coders didn’t start from scratch and were aware of good reusable components from the library software community which could be leveraged to make the software construction phase that much faster? Indeed, architectural decisions often have a direct effect on programming decisions that are made, and it helps if those who are architecting things have at least a general understanding of how software is built so that designs stay doable and sane…and so that they’ll know when things are drifting of course.

Also, don’t try to build big complex systems. Live in the beta world. Get some chunk of functionality out quickly so that people can play with it. The hardest part is having the initial idea, and the good news is I see lots of great ideas out in the library blogosphere. I can understand the frustrations in the gap between the idea and running code, but I hope I’ve presented a bunch of areas above in which you can work to turn the idea into the next hot beta, without necessarily needing to code it yourself.

The one danger to moving to a formal process like the one described in the article by IBM is that it may encourage you to build big complex systems on a slow time scale. If you need to thoroughly describe a software solution before beginning to program (the so called waterfall model) you will be spend a lot of time trying to get the design right before even beginning to code to see what actually works. I’ve found the more that the design and the coding can be intermingled the better…since it lets them both inform each other as they go on. This intermingling is easy if you are a small shop and you have a handful of people (1-8) that need to communicate on a regular basis. I imagine most software development groups in libraries are around this size. That being said I think that Richard is right, it’s good to be aware of the different roles that are being played, perhaps by one individual.

After seeing Adrian talk about software development and journalism at Snakes and Rubies I’ve been thinking off and on about the space between libraries and software development. I’m particularly interested in how one informs the other…and found Richard’s post to be a good catalyst.

selenium/ruby sprint

Wednesday, December 7th, 2005

In case you missed it, or aren’t subscribed the chicago-ruby group is getting together for a Selenium demo/sprint session on Dec 13th. I’ve seen Selenium demo’d before at a chipy meeting, and look forward to a more in depth look since several of my friends really like this testing tool. I think Jason is particularly interested in getting some Ruby driver support.

gmail + atom

Tuesday, December 6th, 2005

I imagine this is old hat to long time gmail users but I just noticed that my gmail is available via atom…and very easily at that with Mark Pilgrim’s Universal Feed Parser.

    import feedparser
 
    feed = feedparser.parse(
        'https://username:password' + 
        '@gmail.google.com/gmail/feed/atom')
 
    for entry in feed.entries:
        print entry.title

Washington Post U.S. Congress Votes Database

Monday, December 5th, 2005

Umm, wow! Adrian Holovaty announced the Washington Post Congressional Votes Database. This site is of important for at least two reasons:

  • it offers RSS feeds for tracking the voting of your house/senate representative
  • it is powered by Python and the Django web framework.

As far as the RSS goes I just pulled up Dick Durbin’s recent votes and there were over 20 events since 11/17/2005, whereas the comparable service from GovTrack had only one event since then.

After the election I daydreamed about somehow getting involved in the political process in a technical way…which is how I found my way to GovTrack, who are essentially doing very elaborate screen scraping of the Thomas database at the Library of Congress. One thing I really like about the GovTrack project is they are making their data available as RDF, for downstream applications. Adrian’s work seems to draw on a richer data source, as I imagine is the case at a place like the Post. All I can say is well done, and damn…you’ve only been there for a couple months right? Talk about hitting the ground running.

At the recent Snakes and Rubies Adrian indicated that there was going to be some huge Django related news. When the voting db hit my Instant Messenger, IRC client and RSS aggregator I thought that this was it. But according to Adrian there’s something bigger in the works…

snakes and rubies

Sunday, December 4th, 2005


I managed to attend Snakes and Rubies yesterday where Adrian Holovaty and David Heinemeier Hansson talked about their respective web frameworks: Django and Rails.

The event started at 2PM and went to 6:30PM or so, and was attended by over 100 people! I watched this little event take shape out of the mists of the local python and ruby mailing lists and was just amazed to see how vibrant the Chicago software development scene is, or has become in the 3 years I’ve lived in the area.

What’s more Adrian and David did a great job promoting both of their projects, while remaining amiable and respectful of the other camp. It’s hard to imagine a similar event between two commercial frameworks. Both were given about 45 minutes or so to talk about their software in any way they wanted. They both had extremely different yet effective presentation styles, and their projects had one important thing in common: disillusionment with PHP.

Rather than talking technical details Adrian spent most of his time focused on how Django came to be down in Kansas at lawrence.com. lawrence.com began it’s life as a PHP application which served as a community site for all sorts of goings on in Lawrence, Kansas. The site interleaved all sorts of local entertainment content: music, dining, art, movies…and it encouraged user participation. For example you can listen to mp3s from local musicians, but here’s the twist, you listen to them as they are playing in town…so if you like a song you can jump over to the venue later that week to see them live. Another example was a full on sports site for the local little leagues which posted details of games, scores, weather conditions, etc. All of this was detailed to show how deeply intertwined all the data was.

The really interesting stuff for me was when Adrian described how journalism informed his software development practices…and how Django fed into this process. In the same way that journalists work against the clock to get news stories out quickly and accurately Adrian and his team worked to get software projects done often on similar deadlines (sometimes like 4 hours). They quickly found that their PHP infrastructure wasn’t allowing them to respond quickly enough without introducing side effects, and decided that they needed new tools…which is how Django was born. In fact the little league application mentioned above was the first Django application.

Adrian has since moved on to the Washington Post, where he is their resident web technology mad scientist. Apparently they are using Django in some form at the Post, or are planning to since he mentioned Django’s caching can scale to the 9 million odd requests the Post gets in a single day.

Unfortunately my lead pencil ran out of lead just a bit of the way into David’s talk, so I don’t have as much written down from the Rails presentation. David dropped some wonderful one liners that I wish I could have written down. Much unlike Adrian, David let actual code do most of the talking for him.

Early on he had a screen with a quote from Richard Feynman on the importance of finding beautiful solutions to problems (if you remember the quote please let me know). This quote kind of guided the rest of the talk where David showed off beautiful Model, View and Controller code from RubyOnRails…and it really was beautiful stuff. David’s thesis was that beautiful things make you happy, and happiness makes you more productive…so beautiful code will make for happy, productive programmers. Much of this comes back to the essential philosophy of Ruby–to give joy to programmers. At any rate, the lights were dimmed and David gave us a tour of what RubyOnRails code looks like, while highlighting some of the strengths of the project and the Ruby language. On one of the pages there was some code to set a cookie expiration, and the date was created like so:

   20.years.from_now

How cool is that! I wasn’t sure if this was part of Ruby proper until I fired up my ruby interpreter to check:

biblio:~ ed$ irb
irb(main):001:0> 20.years.from_now
NoMethodError: undefined method `years' for 20:Fixnum
        from (irb):1

Whereas from the Rails console it works fine:

biblio:~/Projects/cheap ed$ script/console 
Loading development environment.
>> 20.years.from_now
=> Thu Dec 04 15:38:48 CST 2025

So Rails decorates the Fixnum class with the years method. Pretty awesome :-) Another thing David highlighted was that Ruby is used everywhere, from configuration, to writing XML, to writing JavaScript. I was even surprised to hear him argue for full on Ruby in view templates. His argument is that even when a framework offers only a limited set of tags, it’s still offering logic, and rather than creating some bastardized tag language why not just use tried and true Ruby.

The two presentations were followed by a few (not many) moderated questions, and some questions from the audience. The highlight for me was when Why the Lucky Stiff’s question was asked:

Looking a bit beyond web frameworks, how do you envision the world coming to an end?

David responded by “scoping” the world to mean the world of software development and said that this world would come to an end if the layers of Java “sedimentation” continue to accrue. He went on to predict that we’re at a crossroads in software development, and that a paradigm shift is underway…intentionally provocative, and pretty much right on as far as web development goes if you ask me. Adrian responded “Yoko Ono”.

So, as you can tell I’m still digesting the presentations and discussion. There was so much good stuff, and I was really struck by the collegiality between the two guys: open source software development at its finest. The two main things I took away were embracing the boundaries between software development and a particular industry like Journalism, or in my case Libraries; and always trying to strive for the beautiful in software, “boiling down” a thorny problem into its most simple and elegant expression.

code4lib 2006

Friday, December 2nd, 2005

Some #code4lib regulars (who also help put on Access up north) have managed to get some space at the Oregon State University in February for code4lib 2006:

code4lib 2006 is a loosely structured conference for library technologists to commune, gather/create/share ideas and software, be inspired, and forge collaborations. It is also an outgrowth of the Access HackFest, wrapped into a conference-ish format. It is *the* event for technologists building digital libraries and digital information systems, tools, and software.

A call for proposals is out. The nice thing about this conference is that there will be different levels of involvement: from keynote speakers, to shorter presentations, to lightning talks, and with space/time to actually hack at stuff/brainstorm with colleagues.

We’re hoping to attract both library professionals who use computers, and computer professionals who have an interest in libraries. The registration is now open as well at a discounted price. If you are interested in computers and libraries please submit a proposal or register to attend!