I’ve struggled in the past with what constitutes an Information Resource in the context of Web Architecture, Linked Data and practical digital library applications such as the National Digital Newspaper Project I work on at the Library of Congress. So it was reassuring to see the issue come up a few months ago during a review of the effort to revise the HTTP specification (RFC 2616). It would be a major effort to summarize the entire conversation here. However an interesting sub-discussion circled around the idea of normalizing the language in the Architecture of the World Wide Web and RFC 2616 with respect to Resources.

Well into the multi-month thread Tim Berners-Lee offered up a very helpful, historical recap of the “what is a resource” issue , in which he said:

I would like to see what the documents [AWWW and RFC 2616] all look like if edited to use the words Document and Thing, and eliminate Resource.

A Short History of “Resource”

Which, somewhat predictably, started a discussion of what a Document is. However this conversation seemed more tangible and earthy, and culminated in Larry Masinter recommending David M. Levy’s book Scrolling Forward:

… since much of the thought behind it informs a lot of my own thinking about the nature of “Document”, “representation”, “Resource” and the like.

www-tag email message

Now Larry is a scientist at Adobe, a company that knows a thing or two about electronic documents. He also works closely with the W3C and IETF on web architectural issues. So when he suggested reading a book to learn what he means by Document my ears perked up. The interjection of a book reference into this rapid-fire email exchange was like a magic spell, that made me pause, and consider that a working definition of Document was nuanced enough to be the subject matter of an entire book.

I’ve come to expect references to Michael Buckland’s classic What is a Document? in discussions of documents. I hadn’t run across David Levy’s name before so Larry’s recommendation was enough for me to request it from the stacks, and give it a read. I wasn’t disappointed. Scrolling Forward is an ode to documents of all shapes and sizes, from all time periods. It’s a joyful, mind expanding work, that explores the entire landscape of our documents: from cash register receipts, the multi-editioned Leaves of Grass, email messages, letters, books, photographs, papyrus scrolls, greeting cards and web pages. Since this takes place in 212 pages, it is not surprising that the analysis synthesizes rather than being exhaustive. Having received a doctorate in computer science from Stanford, obtained a diploma in calligraphy and bookbinding from the Roehampton Institute, and then worked at Xerox PARC studying the nature of documents for 15 years, Levy’s own professional career is marked by a bringing together of scientific and humanistic disciplines.

One of the key messages of the book is a working definition of the Document. Levy’s draws out his definition largely in contrast to a statement made by David Weinberger in his 1996 Wired piece What’s a Document? where he says:

The fact that we can’t even say what a document is anymore indicates the profundity of the change we are undergoing in how we interact with information and, ultimately, our world.

What is a Document?

To which Levy responds:

We can say what a document is. Doing this, however, requires a somewhat different approach from that which dictionaries take. It requires going beyond word usage. It does require looking at the relevant technologies, but in such a way that we aren’t fixated on them, that we don’t fetishize them. Most of all, it requires immersing ourselves in the social roles these technologies play.

Scrolling Forward p. 23

So Scrolling Forward is a survey of sorts; a survey of document types that are inextricably linked to the social contexts in which they were created. This approach to describing rather than positing a theory of documents dove-tailed nicely with some reading of Wittgenstein I’ve been doing recently. In Wittgenstein’s later period he eschewed positing philosophical theories, but instead attempted to resolve philosophical problems by exploring the richness of language and its use in social settings, or language games, to lay bare the problem in a therapeutic way. Levy takes a similar approach in simply laying out the complex, sometimes contradictory history of documents before us, instead of carving out a logical argument and selecting facts to support it.

Some parts of the book that were of particular interest to me (as a software developer working in the area of digital preservation) were the sections discussing document fixity:

… paper documents, and indeed all documents are static and changing, fixed and fluid. There is a reason why text and graphics editors have a Save button, after all.

Scrolling Forward p. 36

Also of interest was Levy’s analysis of why the idea of “digital libraries” is such a lightning rod of opinion (which perhaps applies to its sister concept “repositories”).

[The] ambiguity between institution and collection is carried through in the phrase “digital library”. For some groups, most notably librarians, the phase refers most directly to institutions that oversee digital collections, while for other professionals, primarily computer and information scientists, it refers to digital collections, without regard to the institutional settings (if any) in which they might be managed … Digital library, it seems to me, draws much of its power from this ambiguity: it provides a name for collections of digital materials that invokes the aura of the modern library and its social mission (library as social institution). But it does so without actually making any commitments to the public good (library as collection).

Scrolling Forward p. <span bibo:pages“>135

And finally, Levy doesn’t shy away from the big questions of how our psychological and religious impulses influence our notions of what documents are.

The human search for and construction of order […] is our response to the profound mystery, and accompanying anxiety, of existence. Emerging into an unfathomable universe and fearing we are nothing within it, we strive to create a meaningful and ultimately immortal place for ourselves […] Culture creates the conditions for a meaningful existence, for us to play out our games of physical and symbolic survival. But it is an ongoing performance, a play we can never stop performing, lest we see the back-stage gears and levers and be reminded of the mysterious and terrifying backdrop against which we are performing it. [Documents] are death-transcending, lack-filling artifacts of major proportions. Perhaps they can’t literally prevent our physical demise or fill our deepest sense of lack. But they are the central participants in our attempts to do so. Every one of them – each cash register receipt, each greeting card, each Post-it note – makes a contribution to the collaborative edifice we call human culture. Although few carry the weight of the Bible or the Constitution, all of them inform us of “what is and what we should do”. And in concert they help us create and sustain an orderly, and meaningful human lifeworld.

Scrolling Forward pp. 187-188

Heady stuff to be sure. And now I feel like I’ve traveled far from the beginning of this blog post, and the definition of information resources and the semantic web. Scrolling Forward has given me a very personal perspective on what documents are, and have been–and as a result I’m a bit more hopeful about the future of electronic documents. Working in digital preservation, it’s sometimes pretty easy to give in to despair. I’m not sure what the the application of this perspective is towards the normalization of language in the Architecture of the World Wide Web and RFC 2616. But it seems certain that part of the answer lies in not taking our information technologies too seriously, and trying to stay focused on the roles that they play in our individual and collective lives:

We make a mistake, I believe, when we fixate on particular forms and technologies, taking them, in and of themselves, to be the carriers of what we want either to embrace or resist. Not only do we fail to see the forms and technologies in their full complexity, but we use them, in their symbolic simplicity, as blunt instruments with which to beat one another over the head.

    <cite><a rel="dct:source" href="http://openlibrary.org/b/OL3947422M">Scrolling Forward</a> p. <span property="bibo:pages">198</span></cite></p>

PS. The bibliography is a great source of new material to read too.
PSS. This blog post was also a not-so-secret experiment in using RDFa and the Bibliographic Ontology to mark up quotations. Check out the <a href=“http://www.w3.org/2007/08/pyRdfa/extract?format=turtle&uri=http://inkdroid.org/journal/2009/09/10/documents/”>rdf assertions you can extract from it using the RDFa Distiller.