Skip to content

federal register embraces the web and opensource

Tom Lee of the Sunlight Foundation blogged yesterday about the new Federal Register website. The facelift was also announced a few days earlier by the Archivist of the United States, David Ferriero. If you aren’t familiar with it already, the Federal Register is basically the daily newspaper of the United States Federal Government, which details all the rules and regulations of the federal agencies. It is compiled by the Office of the Federal Register located in the National Archives, and printed by the Government Printing Office. As the video describing the new site points out, the Federal Register began publication in 1936 in the depths of the Great Depression as a way to communicate in one place all that the agencies were doing to try to jump start the economy. So it seems like a fitting time to be rethinking the role of the Federal Register.

I’m no usability expert, but just a few minutes browsing the new site and comparing it to the old one make it clear what a leap forward this is. Hopefully the legal status of the new site will be ironed out shortly.

Most of all it’s great to see that the Federal Register is now a single web application. The service it provides to the American public is important enough to deserve its own dedicated web presence. As the developers point out in their video describing the effort, they wanted to make the Federal Register a “first class citizen of the web”…and I think they are certainly helping do that. This might seem obvious, but often there is a temptation to jam publications from the print world (like the Federal Register) into dumbed down monolithic repositories that treat all “objects” the same. Proponents of this approach tend to characterize one off websites like Federal Register 2.0 as “yet another silo”. But I think it’s important to remember that the web was really created to break down the silo walls, and that every well designed web site is actually the antithesis of a silo. In fact, monolithic repository systems that treat all publications as static documents to be uniformly managed are more like silos than these ‘one off’ dedicated web applications.

As a software developer working in the federal government there were a few things about the Federal Register 2.0 that I found really exciting:

  • Fruitful collaboration between federal employees and citizen activist/geeks initiated by a software development contest.
  • Extensive use of opensource technologies like Ruby, Ruby on Rails, MySQL, Sphinx, nginx, Varnish, Passenger, Apache2, Ubuntu Linux, Chef. Opensource technologies encourage collaboration by allowing citizen activists/technologists to participate without having to drop a princely sum.
  • Release of the source code for the website itself, using decentralized revision control (git) so that people can easily contribute changes, and see how the site was put together.
  • Extensive use of syndicated feeds to communicate how how content is being added to the site, ical feeds to keep on top of events going on in your area, and detailed XML for each entry.
  • The robots.txt file for the site makes the content fully crawlable by web indexers, except for search related portions of the website. Excluding dynamic search results is often important for performance reasons, but much of the article content can be discovered via links, see below about permalinks. They also have made a sitemap available for crawlers to efficiently discover URLs for the content.
  • Deployment of the web application to the cloud using Amazon’s EC2 and S3 services. Cloud computing allows computing resources to scale to meet demand. In effect this means that government IT shops don’t have to make big up front investments in infrastructure to make new services available. I guess the jury is still out, but I think this will eventually prove to greatly lower the barrier to innovation in the egov sector. It also lets the more progressive developers in government leap frog ancient technologies and bureaucracies to get things done in a timely manner.
  • And last, but certainly not least … now every entry in the Federal Register has a URL!. Permalinks for the Federal Register are incredibly important for citability reasons. I predict that we’ll quickly see more and more people referencing specific parts of the Federal Register in social media sites like Facebook, Twitter and out on the open web in blogs, and in collaborative applications like Wikipedia.

I would like to see more bulk access to XML data made available, for re-purposing on other websites–although I guess it might be able to walk from the syndicated feeds to the detailed XML. Also, the search functionality is so rich it would be useful to have an OpenSearch description that documents it, and perhaps provides some hooks for getting back JSON and/or XML representations. Perhaps even following the lead of the London Gazette and trying to make some of the structured metadata available in the the HTML using RDFa. It also looks like content is only available for 2008 on, so it might be interesting to see how easy it would be to make more of the historic content available.

But the great thing about what these folks have done is now I can fork the project on github, see how easy it is to add the changes, and let the developers know about my updates to see if they are worth merging back into the production website. This is an incredible leap forward for egov efforts–so hats off to everyone who helped make this happen.

linking things and common sense

Tom Scott’s recent Linking Things post got me jotting down what I’ve been thinking lately about URIs, Linked Data and the Web. First go read Tom’s post if you haven’t already. He does a really nice job of setting the stage for why people care about using distinct URIs (web identifiers) for identifying web documents (aka information resources) and real world things (aka non-information resources). Tom’s opinions are grounded in the experience of really putting these ideas into practice at the BBC. His key point, which he attributes to Michael Smethurst, is that:

Some people will tell you that the whole non-information resource thing isn’t necessary – we have a web of documents and we just don’t need to worry about URIs for non-information resources; others will claim that everything is a thing and so every URL is, in effect, a non-information resource.

Michael, however, recently made a very good point (as usual): all the interesting assertions are about real world things not documents. The only metadata, the only assertions people talk about when it comes to documents are relatively boring: author, publication date, copyright details etc.

If this is the case then perhaps we should focus on using RDF to describe real world things, and not the documents about those things.

I think this is an important observation, but I don’t really agree with the conclusion. I would conclude instead that the distinction between real world and document URIs is a non-issue. We should be able to tell if the thing being described is a document or a real world thing based on the vocabulary terms that are being used.

For example, if I assert:

<http://en.wikipedia.org/wiki/William_Shakespeare> a foaf:Person ; foaf:name "William Shakespeare" .

Isn’t it reasonable to assume http://en.wikipedia.org/wiki/William_Shakespeare identifies a person whose name is William Shakespeare? I don’t have to try to resolve the URL and see if I get a 303 or 200 response code do I?

And if I also assert,

<http://en.wikipedia.org/wiki/William_Shakespeares> dcterms:modified "2010-06-28T17:02:41-04:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>

can’t I can assume that http://en.wikipedia.org/wiki/William_Shakespeare identifies a document that was modified on 2010-06-28T17:02:41? Does it really make sense to think that the person William Shakespeare was modified then? Not really…

Similarly if I said,

<http://en.wikipedia.org/wiki/William_Shakespeare> cc:license <http://creativecommons.org/licenses/by-sa/3.0/> .

Isn’t it reasonable to assume that http://en.wikipedia.org/wiki/William_Shakespeare identifies a document that is licensed with the Attribution-ShareAlike 3.0 Unported license? It doesn’t really make sense to say that the person William Shakespeare is licensed with Attribution-ShareAlike 3.0 Unported does it? Not really…

Why does the Linked Data community lean on using identifiers to do this common sense work? Well, largely because people argued about it for three years and this is the resolution the W3C came to. In general I like the REST approach of saying a URL identifies a Resource, and that when you resolve one you get back a Representation (a document of some kind, html, rdf/xml, whatever). Why does it have to be more complicated than that?

If it’s not clear if an assertion is about a document or a thing, why isn’t that a problem with the vocabulary in use being underspecified and vague? I believe this is essentially the point that Xiaoshu Wang made three years ago in his paper URI Identity and Web Architecture Revisited.

To get back to Tom’s point, I agree that the really interesting assertions in Linked Data are about things, and their relations, or as Richard Rorty said a bit more expansively:

There is nothing to be known about anything except an initially large, and forever expandable, web of relations to other things. Everything that can serve as a term of relation can be dissolved into another set of relations, and so on for ever. There are, so to speak, relations all the way down, all the way up, and all the way out in every direction: you never reach something which is not just one more nexus of relations.

Philosophy and Social Hope, pp 53-54.

But assertions about a document, albeit being a bit more on the dry side, are also useful and important, such as: who created the web document, when they created it, a license associated with the document, its relation to previous versions, etc. As a software developer working in a library I’m actually really interested in this sort of administrivia. In fact the Open Archives Initiative Object Reuse and Exchange vocabulary, and the Memento efforts are largely about relating web documents together in meaningful and useful ways: to be able to harvest compound objects out of the web, and to navigate between versions of web documents. Heck, the Dublin Core vocabulary started out as an effort to describe networked resources (essentially documents), and the gist of the Dublin Core Metadata Terms retain much of this flavor. So I think RDF is also important for describing documents on the web, or (more accurately) representations.

So, in short:

  1. URLs identify resources.
  2. A resource can be anything.
  3. When you resolve a URL you get a representation of that resource.
  4. If a representation is some sort of flavor of RDF, the semantics of an RDF vocabulary should make it clear what is being described.
  5. If it’s not clear, maybe the vocabulary sucks.

I think this is basically the point that Harry Halpin and Pat Hayes were making in their paper In Defense of Ambiguity. A URL has a dual role: it identifies resources, and it allows us to access representations of resources. This ambiguity is the source of its great utility, expressiveness and power. It’s why we see URLs on the sides of buses and buildings. It’s why a QR Code slapped on some real world thing has a URL embedded in it.

In an ideal world (where people agreed with Xiaoshu, Harry and Pat) I don’t think this would mean we would have to redo all the Linked Data that we have already. I think it just means that publishers who want the granularity of distinguishing between real world things and documents at the identifier level can have it. It would also mean that the Linked Data space can accommodate RESTafarians, and other mere mortals who don’t want to ponder whether their resources are information resources or not. And, of course, it would mean we could use a URL like http://en.wikipedia.org/wiki/William_Shakespeare to identify William Shakespeare in our RDF data …

Wouldn’t that be nice?

scoping intertwingularity

Dan Brickley’s recent post to the public-lod discussion list about the future of RDF is one of the best articulations of why I appreciate the practice of linking data:

And why would anyone care to get all this semi-related, messy Web data? Because problems don’t come nicely scoped and packaged into cleanly distinct domains. Whenever you try to solve one problem, it borders on a dozen others that are a higher priority for people elsewhere. You think you’re working with ‘events‘ data but find yourself with information describing musicians; you think you’re describing musicians, but find yourself describing digital images; you think you’re describing digital images, but find yourself describing geographic locations; you think you’re building a database of geographic locations, and find yourself modeling the opening hours of the businesses based at those locations. To a poet or idealist, these interconnections might be beautiful or inspiring; to a project manager or product manager, they are as likely to be terrifying.

Any practical project at some point needs to be able to say “Enough with all this intertwingularity! this is our bit of the problem space, and forget the rest for now”. In those terms, a linked Web of RDF data provides a kind of safety valve. By dropping in identifiers that link to a big pile of other people’s data, we can hopefully make it easier to keep projects nicely scoped without needlessly restricting future functionality. An events database can remain an events database, but use identifiers for artists and performers, making it possible to filter events by properties of those participants. A database of places can be only a link or two away from records describing the opening hours or business offerings of the things at those places. Linked Data (and for that matter FOAF…) is fundamentally a story about information sharing, rather than about triples. Some information is in RDF triples; but lots more is in documents, videos, spreadsheets, custom formats, or [hence FOAF] in people’s heads.

Dan’s description is also a nice illustration of how the web can help us avoid Yak Shaving, by leveraging the work of others:

Any seemingly pointless activity which is actually necessary to solve a problem which solves a problem which, several levels of recursion later, solves the real problem you’re working on.

I’m just stashing that away here so I can find it again when I need it. Thanks danbri!

Confessions of a Graph Addict

Today I’m going to be at the annual conference of the American Library Association today for a pre-conference about Libraries and Linked Data. I’m going to try talking about how Linked Data, and particularly how the graph data structure fits the way catalogers have typically thought about bibliiographic information. Along the way I’ll include some specific examples of Linked Data projects I’ve worked on at the Library of Congress–and gesture at work that remains to be done.

Tomorrow there’s an unconference style event at ALA to explore the what Linked Data means for Libraries. The pre-conference today is booked up, but the event tomorrow is open to the public, so please consider dropping by if you are interested and in the DC area.

bibliographic records on the web

There are a couple interesting threads (disclaimer I inadvertently started one) going on over on the Open Library technical discussion list about making Linked Data views available for authors. Since the topic was largely how to model people, part of the discussion spilled over to foaf-dev (also my fault).

When making library Linked Data available my preference has been to follow the lead of Martin Malmsten, Anders Söderbäck and the Royal Library of Sweden by modeling authors as People using the FOAF vocabulary:

<http://libris.kb.se/resource/auth/317488>
    libris:key "Berners-Lee, Tim" ;
    a foaf:Person ;
    rdfs:isDefinedBy <http://libris.kb.se/data/auth/317488> ;
    skos:exactMatch <http://viaf.org/viaf/23002995> ;
    foaf:name "Berners-Lee, Tim", "Lee, Tim Berners-", "Tim Berners- Lee", "Tim Berners-Lee" .

It seems sensible enough right? But there is some desire in the library community to model an author as a Bibliographic Resource and then relate this resource to a Person resource. While I can understand wanting to have this level of indirection to assert a bit more control, and to possibly use some emerging vocabularies for RDA, I think (for now) using something like FOAF for modeling authors as people is a good place to start.

It will engage folks from the FOAF community who understand RDF and Linked Data, and get them involved in the Open Library Project. It will make library data fit in with other Linked Data out on the web. Plus, it just kind of fits my brain better to think of authors as people…isn’t that what libraries were trying to do all along with their authority data? I’m not saying that FOAF will have everything the library world needs (it won’t), but it’s an open world and we can add stuff that we need, collaborate, and make it a better place.

Anyway, that’s not really what I wanted to talk about here. Over the course of this discussion Erik Hetzner raised what I thought was an important question:

Are you saying that there is a usable distinction between:

1. a bibliographic record, and
2. the data contained in that bibliographic record?

From above, my first notion would be to model things as, in
pseudo-Turtle::

<Victor Hugo> a frbr:Person .
<Victor Hugo> rdfs:isDefinedBy <bib record> .
<bib record> dc:modified “…”^^xsd:date .

But it seems to me that you are adding a further distinction::

<Victor Hugo> a frbr:Person .
<Victor Hugo> rdfs:isDefinedBy <bib record> .
<bib record> rdfs:isDefinedBy <bib record data>
<bib record data> dc:modified “…”^^xsd:date .

Is this a usable or useful distinction? Are there times when we want to distinguish between the abstract bibliographic record and the representation of a bibliographic record? In linked data-speak, is a bibliographic record a non-information resource? My thinking has been that a bibliographic record is an information resource, and that one does not need to distinguish between (1) and (2) above.

I think it’s an important question because I don’t think it’s been really discussed much before, and has a direct impact on what sort of URL you can use to identify a Bibliographic Record, and what sort of HTTP response a client gets when it is resolved. This is the httpRange-14 issue, which is covered in Cool URIs for the Semantic Web. If a Bibliographic Record is an Information Resource then its OK to identify the record with any old URL, and for the server to say 200 OK like normal. If it’s not an Information Resource then the URL should either have a hash fragment in it, or the server should respond 303 See Other, and redirect to another location.

In my view if a Bibliographic Record is on the web with a URL, it is useful to think of it as an Information Resource…or (as Richard Cyganiak dubs it) a Web Document. I don’t think it’s worthwhile philosophizing about this, but instead to think about it pragmatically. I think it’s useful to consider

as being an identifier for a bibliographic record that happens to be in HTML. Likewise

are all identifiers for Bibliographic Records in MODS, Dublin Core and MARCXML respectively. It might be useful to link them together as they are with <link> elements in the HTML, or in some RDF serialization. It also could be useful to treat one as canonical, and content negotiate from one of the URLs (e.g. curl –header “Accept: application/marc+xml” http://lccn.loc.gov/99027665). But I think it simplifies deployment of library Linked Data to think of bibliographic records as things that can be put on the web as documents, without worrying too much about httpRange-14. A nice side effect of this is that it would grandfather in all the OPAC record views out there. Maybe it’ll be useful to distinguish between an abstract notion of a bibliographic record, and the actual document that is the bibliographic record — but I’m not seeing it right now…and I think it would introduce a lot of unnecessary complexity in this fragile formative period for library Linked Data.

the 5 stars of open linked data

While perusing the minutes of today’s w3c egov telecon I noticed mention of Tim Berners-Lee’s Bag of Chips talk at the gov2.0 expo last week in Washington, DC. I actually enjoyed the talk not so much for the bag-of-chips example (which is good), but for the examination of Linked Data as part of a continuum of web publishing activities associated with gold stars, like the ones you got in school. Here they are:

make your stuff available on the web (whatever format)
★★ make it available as structured data (e.g. excel instead of image scan of a table)
★★★ non-proprietary format (e.g. csv instead of excel)
★★★★ use URLs to identify things, so that people can point at your stuff
★★★★★ link your data to other people’s data to provide context

I think it’s helpful to think of Linked Data in this context, and not to minimize (or trivialize) the effort and the importance of getting the first 3 stars.

It was interesting that he didn’t mention RDF once (unless I missed it) and talked instead about Linked Data Format. Correction he did mention it, thanks Anders. The inclusiveness and ambiguity appeals to me.

wee bit

As is my custom, this morning I asked Zoia (the bot in #code4lib) for this day in history from the Computer History Musuem. Lately I’ve been filtering it through the Pirate plugin, which transforms arbitrary text into something a pirate might say. Anyhow, today’s was pretty humorous.

11:32 < edsu> @pirate [tdih]
11:32 < zoia> edsu: Claude Shannon be born in Gaylord, Michigan.  Known as th'
              inventor 'o information theory, Shannon be th' first to use th'
              word "wee bit."  Shannon, a contemporary 'o Johny-boy von
              Neumann, Howard Aiken, 'n Alan Turin', sets th' stage fer th'
              recognition 'o th' basic theory 'o information that could be
              processed by th' machines th' other pioneers developed.  He
              investigates information distortion, redundancy 'n noise, 'n (1
              more message)
11:33 < edsu> @more
11:33 < zoia> edsu: provides a means fer information measurement.  He
              identifies th' wee bit as th' fundamental unit 'o both data 'n
              computation.

Happy Birthday Cap’n Shannon.

Dear Footnote Bot

Thanks for taking an interest in the historic content on a website I help run. We want to see the NDNP newspaper content get crawled, indexed and re-purposed in as many places as possible. So we appreciate the time and effort you are spending on getting the OCR XML and JPEG2000 files into Footnote. I am a big fan of Footnote and what you are doing to help historical/genealogical researchers who subscribe to your product.

But since I have your ear, it would be nice if you identified yourself as a bot. Right now you are pretending to be Internet Explorer:

38.101.149.14 - - [22/Apr/2010:18:38:39 -0400] "GET /lccn/sn86069496/1909-09-08/ed-1/seq-8.jp2 HTTP/1.1" 200 3170304 "-" "Internet Explorer 6 (MSIE 6; Windows XP)" "*/*" "-" "No-Cache"

Oh, and could you stop sending the Pragma: No-Cache header with every HTTP request? We have a reverse-proxy in front of our dynamic content so that we don’t waste CPU cycles regenerating pages that haven’t changed. It’s what allows us to make our content available to well behaved web crawlers. But every request you send bypasses our cache, and makes our site to do extra work.

It’s true, we can ignore your request to bypass our cache. In fact, that’s what we’re doing now. This means we can’t shift-reload in our browser to force the content to refresh–but we’ll manage. Maybe you could be a good citizen of the Web and send an If-Modified-Since header–or perhaps just don’t send Pragma: No Cache?

Identifying yourself with a User-Agent string like “footbot/0.1 +(http://footnote.com/footbot)” would be neighborly too :-)

Yours Sincerely,
Ed

PS

ed@curry:~$ whois 38.101.149.14
...
%rwhois V-1.5:0010b0:00 rwhois.cogentco.com
38.101.149.14
network:ID:NET4-2665950018
network:Network-Name:NET4-2665950018
network:IP-Network:38.101.149.0/24
network:Postal-Code:84042
network:State:UT
network:City:Linden
network:Street-Address:355 South 520 West
network:Org-Name:iArchives Inc dba Footnote
network:Tech-Contact:ZC108-ARIN
network:Updated:2008-05-21 13:05:26
network:Updated-by:Gus Reese

research ideas for library linked data

The past few weeks have seen some pretty big news for Library Linked Data. On April 7th the Hungarian National Library announced that its entire library catalog, digital library holdings, and name/subject authority data are now available as Linked Data. Then just a bit more than a week later, on April 16th the German National Library announced that it was making its name and subject authority files available as Linked Data.

This adds to the pioneering work that the Royal Library of Sweden has already done in making all of its catalog and authority data available, which they announced almost two years ago now. Add to this that OCLC is also publishing the Virtual International Authority File as Linked Data, and that the Library of Congress also makes its subject authority data available as Linked Data and things are starting to get interesting.

About 16 months ago at the Dublin Core Conference in Berlin Alistair Miles predicted that we’d see several implementations of Linked Data at major libraries within the year. I must admit, while I was sympathetic to the cause, I was also pretty skeptical that this would come to pass. But here we are, just a bit past a year and two national libraries and a major library data distributor have decided to publish some of their data assets as Linked Data.

Hey Al, crow never tasted so good…

So now it’s starting to feel like there’s enough extant library Linked Data to start looking at patterns of usage, to see if there are any emerging best practices we could work towards. In particular I think it would be interesting to take a look at:

  • What vocabularies are being used, and is there emerging consensus about which to use?
  • What licenses (if any) are associated with the data?
  • How much linking and interlinking is going on?
  • What sorts of mechanisms does the publisher offer for getting the data: sitemap, feeds, SPARQL, bulk download?
  • What is the quality of the data: granularity, link integrity, vocabulary usage.
  • What approaches to identifiers for “real world things” have publishers taken: hash, slash, 303, PURLs, reuse of traditional identifiers, etc.
  • What are the relative sizes of the pools of library linked data?
  • How are updates being managed?

Tomorrow I’m meeting with some folks at the Metadata Research Center at the School of Information and Library Science at the University of North Carolina to talk about their HIVE project. Barbara Tillett and Libby Dechman of LC are also here to talk about the use of LCSH, VIAF and RDA. I’m hoping to convince some of the folks at the MRC that answering some of these questions about the use of Linked Data in libraries could be valuable to the library research community. The rumored W3C Incubator Group for Cultural Heritage Institutions and the Semantic Web couldn’t come at a better time.

history and genealogy at semwebdc

Last week’s Washington DC Semantic Web Meetup focused on History and Genealogy Semantics. It was a pretty small, friendly crowd (about 15-20) that met for the first time at the Library of Congress. The group included folks from PBS, the National Archives, the Library of Congress, and the Center for History and New Media–as well as some regulars from the Washington DC SGML/XML Users Group.

Brian Eubanks gave a presentation on what the Semantic Web, Linked Data and specifically RDF and Named Graphs have to offer genealogical research. He took us on a tour through a variety of websites, such as Land Records Database at the Bureau of Land Management, Ancestry.com, Footnote and Google Books and made a strong case for using RDF to link these sorts of documents with a family tree.

As more and more historic records make their way online as Web Documents with URIs, RDF becomes an increasingly useful data model for providing provenance and source information for a family tree. On sites like Ancestry.com it is important to understand the provenance of genealogical assertions, since Ancestry.com allows you to merge other people’s family trees into your own, based on likely common ancestors. In situations like this researchers need to be able to evaluate the credibility or truthfulness of other people’s trees–and being able to source the family tree links to the documents that support them is an essential part of the equation.

Along the way Brian let people know about a variety of vocabularies that are available for making assertions that are of value to genealogical research:

  • rdfcal : for Events
  • BIO : for biographical information
  • Relationship : for describing the links between people
  • FOAF : for describing people
  • TriG : for identifying the assertions that a researcher makes and linking them to a given document

The beautiful thing about RDF for me, is that it’s possible to find and use these vocabularies in concert, and I’m not tempted to create the-greatest-genealogy-vocabulary that does it all. In addition, Brian pointed out that sites like dbpedia and geonames are great sources of names (URIs) for people, places and events that can be used in building descriptions. Brian has started the History and Genealogy Semantics Working Group which has an open membership, and encourages anyone with interest in this area to join. While writing this post I happened to run across a Wikipedia page about Family Tree Mapping, which indicated that some genealogical software already supports geocoding family trees. As usual it seems like the geo community is leading the way in making semantics on the web down to earth and practical.

I followed Brian by giving a brief talk about the Chronicling America, which is the web front-end for data collected by National Digital Newspaper Program, which in turn is a joint project of the Library of Congress and the National Endowment for the Humanities. After giving a brief overview of the program, I described how we were naturally led to using Linked Data and embracing a generally RESTful approach by a few factors:

One thing that I learned during Brian’s presentation is that sites like Footnote are not only going around digitizing historic collections for inclusion in their service, but they also give their subscribers a rich editing environment to search and annotate document text. These annotations are exactly the sort of stuff that would be perfect to represent as and RDF graph, if you wanted to serialize the data. In fact the NSF funded Open Annotation Collaboration project is exploring patterns and emerging best practices in this area. I’ve had it in the back of my mind that allowing users to annotate page content in Chronicling America would be a really nice feature to have. If not at chroniclingamerica.loc.gov proper, then perhaps showing how it could be done by a 3rd party using the API. To some extent we’re already seeing annotation happening in Wikipedia, where people are creating links to newspaper pages and titles in their entries, which we can see in the referrer information in our web server logs. Update: and I just learned that wikipedia themselves provide a service that allows you to discover entries that have outbound links to a particular site, like chroniclingamerica.loc.gov.

Speaking of the API (which really is just REST) if you are interested in learning more about it check out the API Document that Dan Chudnov prepared. I also made my slides available, hopefully the speaker notes provide a bit more context for what I talked about when showing images of various things.

Afterwards a bunch of us headed across the street to have a drink. I was really interested to hear from Sam Deng that (like the group I work in at LC) PBS are big Python and Django shop. We’re going to try to get a little brown bag lunch going on between PBS and LC to talk about their use of Django on Amazon EC2, as well as software like Celery for managing asynchronous task queues.

Also, after chatting with Glenn Clatworthy of PBS, I learned that he has been experimenting with making Linked Data views available for their programs. It was great to hear Glenn describe how assigning each program a URI, and leveraging the nature of the web would make a perfect fit for distributing data in the PBS enterprise. It makes me think that perhaps having a session on what the BBC are doing with Linked Data would be timely?