one billion
Monday, May 16th, 2005Thom Hickey mentioned a new page at OCLC which lists some real time stats for worldcat: total holdings, last record added, etc. Perhaps this is in honor of the total holdings getting very close to crossing the 1 billion mark.
So of course I had to add a plugin for panizzi to scrape the page. Rather than writing yet another state machine for parsing html I decided to try out Frederik Lundh’s ElementTree Tidy HTML Tree Builder, which works out very well when you want to walk a datastructure representing possibly invalid HTML.
url = "http://www.oclc.org/worldcat/grow.htm" tree = TidyHTMLTreeBuilder.parse( urlopen( self.url ) )
That’s all there is to getting nice elementtree object which you can dig into for a page of HTML.
So, predictably:
10:53 < edsu> @worldcat
10:53 < panizzi> edsu: [May 16, 2005 11:49 AM EDT #981,277,234]
El senor de los anillos. Tolkien, J. R. R. ...
uploaded by OEL - EUGENE PUB LIBR