Skip to content

Tag Archives: newspapers

New York Times Topics as SKOS

Serves 23,376 SKOS Concepts
INGREDIENTS

Text editor: Vim, Emacs, TextMate, etc
Python
BeautifulSoup
rdflib
Internet connection

DIRECTIONS

Open a new file using your favorite text editor.
Instantiate an RDF graph with a dash of rdflib.
Use python’s urllib to extract the HTML for each of the Times Topics Index Pages, e.g. for A.
Parse HTML into a fine, queryable data structure using BeautifulSoup.
Locate topic names and [...]

rest, the semantic web and my feeble brain

Imagine you were minting close to a million URIs for historic newspaper pages such as:

http://chroniclingamerica.loc.gov/lccn/sn85066387/1898-01-01/ed-1/seq-1/

for pages like:

The web page allows you to zoom in quite close and see lots of detail in the page:

Now lets say I want to describe this Newspaper Page in RDF. I need to decide what subject URI to hang the [...]

q & a

Q: What do 100 year old knitting patterns and a lost Robert Louis-Stevenson story have in common?
A: A digitally preserved newspaper page.
Q: What about if you add:

URIs for knitting materials
William Blake’s Engravings
The similarities/differences between XMPP, HTTP and NNTP
Web crawling as data integration
Project coordination with rooms on FriendFeed
brewing Kombucha

A: Just a typical lunch time conversation at [...]

json vs pickle

in python JSON is faster, smaller and more portable than pickle …
At work, I’m working on a project where we’re modeling newspaper content in a relational database. We’ve got newspaper titles, issues, pages, institutions, places and some other fun stuff. It’s a django app, and the db schema currently looks something like:

Anyhow, if you [...]

calais and ocr newspaper data

Like you I’ve been reading about the new Reuters Calais Web Service. The basic gist is you can send the service text and get back machine readable data about recognized entities (personal names, state/province names, city names, etc). The response format is kind of interesting because it’s RDF that uses a bunch of homespun [...]