Skip to content

alien vs predator: www-style

I finally got around to reading Web Services for Recovery.gov by Erik Wilde, Eric Kansa and Raymond Yee. The authors wrote the report with funding from the Sunlight Foundation, who are deeply engaged in improving the way the US Federal Government provides transparent access to its data assets.

I highly recommend giving it a read if you are interested in web services, REST, Linked Data, and simple things you can do to open up access to data. The practicality of the advice is clearly gleaned from the experience of an actual implementation over at recovery.berkeley.edu where they kick the tires on their ideas.

Erik’s blog has a succinct summary of the paper’s findings, which for me boils down to:

any data source that is frequently updated must have simple ways for synchronizing data

Web syndication is a widely deployed mechanism for presenting a list of updated web resources. The authors make a pretty strong case for Atom because of its pervasive use of identifiers for content, extensibility, rich linking semantics, paging, the potential for write-enabled services, install base, and generally just good old Resource Oriented Architecture a.k.a. REST.

Because of my interest in Linked Data the paragraph that discusses why RDF/XML wasn’t chosen as a data format is particularly interesting:

The approach described in this report, driven by a desire for openness and accessibility, uses the most widely established technologies and data formats to ensure that access to reporting data is as easy as possible. Recently, the idea of openly accessible data has been promoted under the term of “linked data”, with recent recommendations being centered around a very specific choice of technologies and data models (all centered around Semantic Web approaches focusing on RDF for data representation and centralized data storage). While it is possible to use these approaches for building Web applications, our recommendation is to use better established and more widely supported technologies, thereby lowering the barrier-to-entry and choosing a simpler toolset for achieving the same goals as with the more sophisticated technologies envisioned for the Semantic Web.

It could be argued that the growing amount of RDF/XML in the Linked Data web make it a contender for Atom’s install base–especially when you consider RSS1.0. However I think the main point the authors are making is that the tools for working with XML documents far outnumber the tools that are available for processing RDF/XML graphs. Furthermore, most programmers I know are more familiar with the processing model and standards associated with XML documents (DOM, XSLT, XPath, XQuery) compared with RDF graphs (Triples, Directed Graph, GRDDL, SPARQL). Maybe this says more about the people I know … and if I were to jump into the biomedical field I’d feel different. But perhaps the most subtle point is that whether or not developers know it, Atom expresses a Graph model just like RDF/XML … but it does it in a much more straightforward, familiar document-centric way.

Of course the debate of whether RDF needed to be a part of Linked Data or not rippled through the semantic web community a few months ago–and there’s little chance of resolving any of those issues here. In the pure-land of RDF model theory the choice between Atom and RDF/XML is a bit of a false dilemma since RDF/XML is minimally processable with, well, XML tools … and idioms like GRDDL allow Atom to be explicitly represented as an RDF Graph. And in fact, REST and Content Negotiation would allow both serializations to co-exist nicely in the context of a single web application. However, I’d argue that this point isn’t a particularly easy thing to explain, and it certainly isn’t terrain that you would want to navigate in documentation on the recovery.gov website. The choice of whether RDF belongs in Linked Data or not has technical and business considerations — but I’m increasingly seeing it as a cultural issue,that perhaps doesn’t really even need resolving.

Even Tim Berners-Lee recognizes that there are quite large hurdles to modeling all government data on the Linked Data web in RDF and querying it with SPARQL. It’s a bit unrealistic to expect the Federal Government to start modeling and storing their enterprise data in a fundamentally new and somewhat experimental way in order to support what amounts to arbitrary database queries from anyone on the web. If that’s what the Linked Data brand is I’m not buying it. That being said, I see a great deal of value in the RDF data model (the giant global graph), especially as a tool for seeing how your data fits the contours of the web.

The important message that Erik, Eric and Raymond’s paper communicates is that the Federal Government should be focused on putting data out on the web in familiar ways, using sound web architecture practices that have allowed the web to grow and evolve into the wonderful environment it is today. Atom is a flexible, simple, commonly supported, well understood XML format for letting downstream applications know about newly published web resources. If the Federal Government is serious about the long term sustainability of efforts like recovery.gov and data.gov they should focus on enabling an ecosystem of visualization applications created by third parties, rather than trying to produce those applications themselves. I hope the data.gov folks also run across this important work. Thanks to Sunlight Foundation for funding the folks at Berkeley.

6 Comments

  1. johnwcowan wrote:

    So invent an RDF serialization that is Atom-compliant. It’ll be a huge win.

    Tuesday, November 3, 2009 at 3:51 pm | Permalink
  2. ed wrote:

    @johncowan “I will call him … mini-RDF” :-) Seriously though Atom already seems to be a suitably good graph serialization for a graph of web resources doesn’t it? It lets someone say here’s a set of resources with URIs (atom:entry, atom:id), and here are the relationships between those resources and other resources (atom:link), and here’s some core metadata about the resources (atom:title, atom:summary, etc), and here’s some space to say whatever else you want about the collection of resources or the resources themselves (use of extensionElement in atom:feed and atom:entry)

    There are a few proposals on the table for bringing the full expressive power of the RDF model to Atom: AtomTriples from Mark Nottingham and Dave Beckett, and the work the Open Archives Initiative Object Reuse and Exchange group did on their Atom serialization for graphs of web resources. I don’t see people clamoring over themselves to use them however.

    But (if you are being serious, heheheh I have my doubts) perhaps you see something simpler that can be done?

    Wednesday, November 4, 2009 at 2:02 am | Permalink
  3. Yes. This feels like one of those situations where worse (Atom) is better.

    Wednesday, November 4, 2009 at 2:09 am | Permalink
  4. I think you’re generally right about Atom being about as good of a compromise as you can get; it’s the same basic reason I’ve been pushing for it for library data.

    The danger is more what appears between the tags. Giving a 3GB blob of lots of discrete data in some arcane format a URI and saying, “here, we’ve made it available! Just check out the Atom feed!” isn’t really a huge improvement over just throwing them into a web-accessible directory. It is also the sort of thing that really undermines the perception of Atom as a powerful and flexible format for making things available on the web.

    Friday, November 6, 2009 at 9:01 am | Permalink
  5. danbri.org/ wrote:

    See also DataRSS from Yahoo SearchMonkey folk –

    http://developer.yahoo.com/searchmonkey/smguide/datarss.html
    http://developer.yahoo.com/searchmonkey/smguide/understand_datarss.html

    This uses rdfa. Are you an rdfa optimist?

    Wednesday, December 23, 2009 at 1:33 am | Permalink
  6. ed wrote:

    @danbri thanks for the pointer to DataRSS from Yahoo. I had seen that before but failed to make the connection between it and some things I had been thinking about at the time. Yes, I guess I’m an RDFa optimist — but ultimately I’m an optimist about the RDF data model, and the Web :-)

    Wednesday, December 23, 2009 at 8:44 am | Permalink

Post a Comment

You must be logged in to post a comment.