Skip to content

Tag Archives: newspapers

Confessions of a Graph Addict

Today I’m going to be at the annual conference of the American Library Association today for a pre-conference about Libraries and Linked Data. I’m going to try talking about how Linked Data, and particularly how the graph data structure fits the way catalogers have typically thought about bibliiographic information. Along the way I’ll include some [...]

New York Times Topics as SKOS

Serves 23,376 SKOS Concepts INGREDIENTS Text editor: Vim, Emacs, TextMate, etc Python BeautifulSoup rdflib Internet connection DIRECTIONS Open a new file using your favorite text editor. Instantiate an RDF graph with a dash of rdflib. Use python’s urllib to extract the HTML for each of the Times Topics Index Pages, e.g. for A. Parse HTML [...]

rest, the semantic web and my feeble brain

Imagine you were minting close to a million URIs for historic newspaper pages such as: http://chroniclingamerica.loc.gov/lccn/sn85066387/1898-01-01/ed-1/seq-1/ for pages like: The web page allows you to zoom in quite close and see lots of detail in the page: Now lets say I want to describe this Newspaper Page in RDF. I need to decide what subject [...]

q & a

Q: What do 100 year old knitting patterns and a lost Robert Louis-Stevenson story have in common? A: A digitally preserved newspaper page. Q: What about if you add: URIs for knitting materials William Blake‘s Engravings The similarities/differences between XMPP, HTTP and NNTP Web crawling as data integration Project coordination with rooms on FriendFeed brewing [...]

json vs pickle

in python JSON is faster, smaller and more portable than pickle … At work, I’m working on a project where we’re modeling newspaper content in a relational database. We’ve got newspaper titles, issues, pages, institutions, places and some other fun stuff. It’s a django app, and the db schema currently looks something like: Anyhow, if [...]

calais and ocr newspaper data

Like you I’ve been reading about the new Reuters Calais Web Service. The basic gist is you can send the service text and get back machine readable data about recognized entities (personal names, state/province names, city names, etc). The response format is kind of interesting because it’s RDF that uses a bunch of homespun vocabularies. [...]