Skip to content

Tag Archives: python

data.gov.uk and rdfa

The recent public release of the UK Government’s data.gov.uk site got picked up by the press last week in articles at The Guardian, Prospect Magazine and elswhere. These have been supplemented by some more technical discussions at ReadWriteWeb, Open Knowledge Foundation, Talis, Jeni Tennison’s blog, and some helpful emails from Leigh Dodds (Talis) [...]

New York Times Topics as SKOS

Serves 23,376 SKOS Concepts
INGREDIENTS

Text editor: Vim, Emacs, TextMate, etc
Python
BeautifulSoup
rdflib
Internet connection

DIRECTIONS

Open a new file using your favorite text editor.
Instantiate an RDF graph with a dash of rdflib.
Use python’s urllib to extract the HTML for each of the Times Topics Index Pages, e.g. for A.
Parse HTML into a fine, queryable data structure using BeautifulSoup.
Locate topic names and [...]

flickr, digital curation and the web

The Library of Congress has started to put selected content from Chronicling America into Flickr as part of the Illustrated Newspaper Supplements set. More details on the rationale and process involved can be found in a FAQ on the LC Newspapers and Current Periodical Reading Room website.
So for example this newspaper page on Chronicling [...]

LibraryThing Ubuntu Screen Saver

I read about the LibraryThing Mac Screensaver and of course wanted the same thing for my Ubuntu workstation at $work. Naturally, I’m supposed to be working on some high-priority tickets on a tight deadline…so I started to work right away on how to do this. Your tax dollars at work, etc…
I’m sure that there’s a [...]

json vs pickle

in python JSON is faster, smaller and more portable than pickle …
At work, I’m working on a project where we’re modeling newspaper content in a relational database. We’ve got newspaper titles, issues, pages, institutions, places and some other fun stuff. It’s a django app, and the db schema currently looks something like:

Anyhow, if you [...]

pymarc PEP-8 cleanup

pymarc v2.0 was released yesterday afternoon. I’m mentioning it here to give a big tip of the hat to Gabriel Farrell (gsf on #code4lib) who spent a significant amount of time cleaning up the code to be PEP-8 compliant.
If you are a current user of pymarc your code will most likely break, since methods [...]