Archive for the ‘python’ Category

pascal’s triangle in python

Wednesday, August 10th, 2005

I mentioned Pascal’s Triangle in the previous post, and after typing in the Oz code decided to make a Pascal’s Triangle pretty printer in python.

from sys import argv
 
def pascal(n):
    if n == 1:
        return [ [1] ]
    else:
        result = pascal(n-1)
        lastRow = result[-1]
        result.append( [ (a+b) for a,b in zip([0]+lastRow, lastRow+[0]) ] )
        return result
 
def pretty(tree):
    if len(tree) == 0: return ''
    line = '  ' * len(tree)
    for cell in tree[0]:
        line += '  %2i' % cell
    return line + "\n" + pretty(tree[1:])
 
if __name__ == '__main__':
    print pretty( pascal( int(argv[1]) ) )

Which, when run with can generate something like this:


biblio:~/Projects/bookclub ed$ python pascal.py 9
                     1
                   1   1
                 1   2   1
               1   3   3   1
             1   4   6   4   1
           1   5  10  10   5   1
         1   6  15  20  15   6   1
       1   7  21  35  35  21   7   1
     1   8  28  56  70  56  28   8   1

It’s been fun reading up on the uses for Pascal’s triangle, although I imagine this is old hat for people more familiar with math than I. Still I think getting through this tome will be time well spent in the long run.

chipy bookclub

Wednesday, August 10th, 2005

So the Chicago Python Group has started up a bookclub about a month ago. The first book we’re reading as a group is Concepts, Techniques, and Models of Computer Programming which is fortunately available online for free. The aim of the bookclub (as with many bookclubs) is to work through a text together, and hopefully get to hear different perspectives during discussion which will happen online and after our monthly meetings. Also, a bit of peer pressure can help make it through certain types of books…

And this first book is a doozy at 939 pages. It covers all sorts of territory from computer science using a multi-paradigm language called Oz. I’ve made it through the preface, and into Chapter 1, which starts out teaching some fundamental concepts behind functional programming. The jury is still out, but so far I’m finding the content refreshingly clear and stimulating. I like the fact that mathematical notation (so far) is explained and not taken for granted. Calculating factorial with recursion is a bit predictable, but chapter 1 quickly moved on to an algorithm that calculates a given row in a Pascal’s Triangle.

The sheer magnitude of the book is a bit intimdating, however reading it on my ibook makes it easy to ignore that. I’m thinking of it as a sort of thematic encyclopedia of computer programming with handy illustrations. Hopefully I’ll find the time to drop my thoughts here as I work my way through each chapter. Please feel free to join us (whether you’re from Chicago or not) if you are interested.

lightning strikes

Friday, June 10th, 2005

Chris has a nice writeup about last nights ChiPy lightning talks. There were tons of interesting people there with very interesting projects. Apart from the announcement that we might be hosting PyCon next year in Chicago, the highlight of the evening for me was hearing about the amazing data hack that is ChicagoCrime. Adrian is a journalist/programmer who managed to glue together GoogleMaps with publicly available data from the Chicago Police Department. The main (perhaps unintended) things I took from his enthusiastic and humorous talk were:

  • screen scraping is fragile but it’s an important lever for fostering more elegant/robust information sharing.
  • screen scraping is fragile but it’s important for building new public applications that aren’t run behind closed doors at the Department of Homeland Security

I really, really want to get going on the GovTrack data scraping now.

pylucene

Thursday, June 9th, 2005

I’m going to be doing a lightning talk tonight at the Chicago Python Group about pylucene. pylucene essentially lets you use the popular Lucene indexing library (Java) in Python. No time limit has been set for the lightning talks (and mjd won’t be there with his gong) but I hope to quickly cover how to index an mbox with pylucene in 5 minutes. There are slides, which are there mainly as cue cards.

pybibutils

Thursday, June 9th, 2005

The #code4lib sprint is coming up soon and (alas) we still don’t really have a firm grasp on what we will be sprinting on. After pycon dchud had some ideas for a metadata wrangling framework for python. Around the same time I was working on SWIG wrapper for the bibutils library. So one idea we had was to create this python utility that would enable converting between many of the popular metadata/citation formats:

Emerging details are available on the wiki. If you have any ideas for the sprint please note them on the wiki.

one billion

Monday, May 16th, 2005

Thom Hickey mentioned a new page at OCLC which lists some real time stats for worldcat: total holdings, last record added, etc. Perhaps this is in honor of the total holdings getting very close to crossing the 1 billion mark.

So of course I had to add a plugin for panizzi to scrape the page. Rather than writing yet another state machine for parsing html I decided to try out Frederik Lundh’s ElementTree Tidy HTML Tree Builder, which works out very well when you want to walk a datastructure representing possibly invalid HTML.

    url = "http://www.oclc.org/worldcat/grow.htm"
    tree = TidyHTMLTreeBuilder.parse( urlopen( self.url ) )

That’s all there is to getting nice elementtree object which you can dig into for a page of HTML.

So, predictably:

10:53 < edsu> @worldcat
10:53 < panizzi> edsu: [May 16, 2005 11:49 AM EDT #981,277,234]
                      El senor de los anillos. Tolkien, J. R. R. ...
                      uploaded by OEL - EUGENE PUB LIBR

name authority fun

Sunday, May 1st, 2005

As a joke dchud suggested that panizzi (the friendly neighborhood bot in #code4lib) should have a plugin for querying the Library of Congress Name Authority File that OCLC provides. The Name Authority File allows librarians the world over to use the same established names when cataloging books, etc. It would serve no purpose in irc, but it could be a good conversation piece…

I had goofed around writing a command line app about half a year ago so I figured it couldn’t be that hard to hack this into the infobot source code. However I guessed wrong…granted I only tried for 30 minutes or so.

Fortunately, python’s supybot was a different story. It’s more modern, has command line programs for configuring a supybot, has built in support for plugins — and has documentation. There is even a command line program supybot-newplugin that will ask a few questions and then autogenerate a template plugin module. All you have to do after that is add a method (with a particular signature given in the docs) which will then do the work and respond.

 
from urllib import urlencode
from urllib2 import urlopen
from elementtree.ElementTree import parse
 
class Naf(callbacks.Privmsg):   
 
    def naf(self,irc,msg,args):
        """&lt;name&gt;
 
        Lookup a personal name in the NAF file at OCLC
        """
 
        alcme = "http://alcme.oclc.org/eprintsUK/services/NACOMatch"
        name = privmsgs.getArgs(args)
        query = urlencode( { \
            "method"          : "getCompleteSelectedNameAuthority",
            "serviceType"     : "rest",
            "name"            : name,
            "maxList"         : "10",
            "isPersonalName"  : "true" } )
 
        url = urlopen( alcme + "?" + query)
        tree = parse(url)
        elem = tree.getroot()
 
        matches = elem.find("wordMatches").getchildren()
        irc.reply( matches[0][0].text )

As an added bonus along the way I got to try out ElementTree which has to be the nicest XML library I’ve ever used. It turned out to be a fun experiment, and will hopefully add to the merriment of the room.

22:03 < edsu> panizzi naf sigmund freud
22:03 < panizzi> edsu: Freud, Sigmund,--1856-1939