archival sliver

Just a quick note for my future self, that Verne Harris’ notion of the “archival sliver” seems like a great sanity inducing antidote to the notion of total archives.

The archival record is best understood as a sliver of a sliver of a sliver of a window into process. It is a fragile thing, an enchanted thing, defined not by its connection to “reality”, but by its open-ended layerings of construction and reconstruction.

The Archival Sliver: Power, Memory and Archives in South Africa.

Suzanne Briet on Ada Lovelace Day

Today is Ada Lovelace Day and I wanted to join libtechwomen in celebrating the contribution of Suzanne Briet. Briet’s thinking helped found the field of Information Science or Documentation Science as it was known then. Documentation was a field of study started by Paul Otlet and Henri La Fontaine which focused on fixed forms of documents (e.g.) books, newspapers, etc. Briet’s contribution expanded the purview of the study of documents to include the social context in which documents are situated. Or as Ronald Day says

Briet’s writings stressed the importance of cultural forms and social situations and networks in creating and responding to information needs, rather than seeing information needs as inner psychological events. She challenges our common assumptions about the role and activities of information professionals and about the form and nature of documents. She speaks to our age of digital libraries, with their multi-documentary forms, but she also challenges the very conceptual assumptions about the form and the organization of knowledge in such digital libraries. Readers of What Is Documentation? will find themselves returning to Briet’s book, again and again, coming upon ever new insights into current problems and ever new challenges to still current assumptions about documents and libraries and about the origins, designs and uses of information management and its systems.

As you may know from previous blog posts here, I’m kind of fascinated with the idea of how the Document is presented in Web Architecture, and how it influences technologies like Linked Data. I spent some time trying to organize my thoughts about this intersection of Libraries, Archives, Information and the Web in a paper Linking Things on the Web: A Pragmatic Examination of Linked Data for Libraries, Archives and Museums. I was lucky to have Dorothea Salo read an early draft of the paper. Among her many useful comments was one which encouraged me to be a bit more precise in my attribution of the term document in information science. I wasn’t even mentioning Briet’s contribution and instead just named Otlet and La Fontaine, with a citation to Michael Buckland. I cited Buckland’s What Is A Document, which funnily enough is partly responsible for raising awareness about Briet’s contribution. Dorothea rightly encouraged me to dig a bit deeper, and to change this paragraph:

The terminology of documents situates Linked Data amidst an even older discourse concerning the nature of documents (Buckland, 1997), or documentation science more generally. Documentation science is a field a field of inquiry established by Paul Otlet and Henri La Fontaine in the 1930s, which was renamed as information science in the 1960s.

to this:

The terminology of documents situates Linked Data amidst an even older discourse concerning the nature of documents (Buckland, 1997), or documentation science more generally. Documentation science, or the study of documents is an entire field of study established by Otlet (1934), continued by Briet (1951), and more recently Levy (2001). As the use of computing technology spread in the 1960s documentation science was largely subsumed by the field of information science. In particular, Briet’s contributions expanded the notion of what is commonly understood to be a document, by reorienting the discussion to be in terms of objects that function as organized physical evidence (e.g. an antelope in the zoo, as opposed to an antelope grazing on the African savanna). The evidentiary nature of documents is a theme that is particularly important in archival studies.

So thanks Dorothea, and thank you Suzanne Briet for grounding what I was finding confounding in Web Architecture. Previously my only exposure to Briet’s thinking was revival literature about her, so I decided to take this opportunity to buy a copy of What Is Documentation to have for my bookshelf. It’s also available online on the Web, which seems fitting, right?

Shutdown, Startup


At the moment, I don’t have a job. The government has been shut down, and with it my job at the Library of Congress. I’ve had the good fortune to be able to pick up some part time work here and there with a few friends, to help make ends meet. I know I shouldn’t say it, but it has actually been kind of rejuvenating to scramble and brainstorm outside of the “permanent” job mentality that I supposedly have. It’s sounding pretty unlikely that I will be paid when the Federal Government re-opens, and it’s not really even clear at this point when it will re-open. Meanwhile there is a mortgage to pay, mouths to feed, and not a whole lot of wiggle room in our budget, or savings to speak of. But we’ll scrape by, like everyone else in the same boat.

But this post isn’t about the shutdown, and it’s not really about me. It’s about a startup, and it’s about my wife Kesa.

Kesa and I met at a startup in New York City in 2000. It was a magical time. We were helping start a business from the ground up, living in a truly amazing city, in a tiny one room apartment that barely fit us and our bookcase…and we were falling in love. We lived in Brooklyn, but our office was in downtown Manhattan, just off Wall Street, and a few blocks from the World Trade Center.

9/11 was an explosive, searing light that annihilated and destroyed…but somehow it also briefly illuminated delicate, evanescent, and commonplace things, making them easier to see. Most of all, the events of 9/11 made me acutely conscious of how important every day I had with Kesa was. One evening later that year I made Italian Wedding Soup for dinner, and asked her if she wanted to get married. She said yes. I think she liked the soup too.

Around that same time Kesa also decided to return to teaching. She had applied for a job in the Brooklyn Public School system and heard back the morning of September the 11th. I guess the day crystallized some things for her too. She remembered her experience teaching K-3rd grade kids how to read in New Orleans. She remembered what it felt like to help make the world a better place, one student at a time, instead of working her butt off to make some software better, that would (maybe) give some big corporations a competitive edge over some other big corporations, so they could sell more widgets. She inspired me in a way that I needed to be inspired, as our country slipped into pointless retaliation, and war.

Over the last 13 years, Kesa has largely been doing just that: teaching 5th grade in Brooklyn, Chicago and here in Washington DC. She took some time off to be with our kids when they were born, but she went back each time. Her philosophy as a teacher has always been to understand each student for their uniqueness. Don’t get me wrong, she is big into the academics; but at the end of the day, it was about connecting with the kids, and seeing them happy and thriving together. The times I went to her class I got to see the evidence of that first hand.

When Maeve (#3!) was born Kesa decided to give something else a try. She started tutoring kids in the neighborhood to see if she could help make ends meet that way. Somewhere along the way the math and reading transitioned to sewing and other crafts. She had caught the makers bug like millions of other people around the country, who are trying to reconnect our culture. She got talking to people like David and Lina Brunton who are trying to bootstrap a farm outside Annapolis, MD. The kids she taught got a real kick out of learning to make their own pajamas, bags and pillows…unwittingly they encouraged her to do more, and to think a bit bigger. She felt like she was onto something.

So in May of this year Kesa went to Baltimore to register the business Freehands Craft Studio. She found an inexpensive space to rent on the 2nd floor of a strip mall near our house in Silver Spring. The photo to the right was taken when she was painting the walls in the new space. I like it because it captures how earnest she was (and still is) about getting Freehands off the ground.

I watched as she networked on neighborhood discussion lists, talked to friends, and friends of friends, and somehow pulled together a small group of teachers with specialties in knitting, paper making, quilting, sewing, jewelry making and collage. Freehands had a few exploratory classes over the summer to figure out logistics, and this fall the classes started in full swing. Last weekend they were at the Silver Spring Mini Maker’s Faire where they demonstrated how to quickly make reusable lunch sacks, and answered questions for 5 hours from tons of people who were interested in what Freehands was doing.

So Kesa is working at a startup again. But this time it’s her startup. As the politicians fight in Congress about how to do their job, it means so much to me to see her building Freehands Craft Studio with her friends. It is a lot of work. I’m having to look after the kids a lot more when she is off teaching a class, or doing outreach of some kind. The startup expenses have set us back a bit more than we expected, and at an awkward time. There’s still a lot more to do to get the business rolling, to build momentum, and let folks outside of our little corner of Silver Spring know about it. But I can tell it’s what Kesa loves doing, because she is smiling when she’s doing it, she gets energy from doing it, the work illuminates her life, and our little family.

So I wrote this post for two people.

It’s for you Kesa. To let you know that even when I grumble about having to rush home to look after the kids, and scrape together a meal and clean up our house so that it doesn’t look like a tornado hit it– in my heart of hearts I’m so proud of you. Your Freehands experiment gives me hope and purpose. You make me happy, just like back when I made that Italian Wedding Soup.

And this post is also for you. There’s no better time to start things up as when other people are shutting things down right? Take some time to consider or remember what you want to start up. It can just be a side project for now. Who knows what it will grow into?

Oh, and if you want to help Kesa and Freehands Craft Studio please consider donating to their Indiegogo campaign, or sharing information about it with others using your social-media-platform-of-choice. There’s only about a day left, and they could really use your help. You can get a little mug or a reusable lunch sack or handmade card as a thank you … and you will become part of this little dream too.

preserving linked data

Earlier this morning Martin Malmsten of the National Library of Sweden asked an interesting question on Twitter:

Martin was asking about the Linked Open Data that the Library of Congress publishes, and how the potential shutdown of the US Federal Government could result in this data being unavailable. If you are interested, click through to the tweet and take a minute to scan the entire discussion.

Truth be told, I’m sure that many could live without the Library of Congress Subject Headings or Name Authority File for a day or two…or honestly even a month or two. It’s not like this data’s currency is essential to the functioning of society, like financial, weather or space data, etc. But Martin’s point is that it raises an interesting general question about the longevity of Linked Open Data, and how it could be made more persistent.

In case you are new to it, a key feature of Linked Data is that it uses the URL to allow a distributed database to grow organically on the Web. So, in practice, if you are building a database about books, and you need to describe the novel Moby Dick, your description doesn’t need to include everything about Herman Melville. Instead it can assert that the book was authored by an entity identified by the URL

When you resolve that URL you can get back data about Herman Melville. For pragmatic reasons you may want to store some of that data locally in your database. But you don’t need to store all of it. If you suspect it has been updated, or need to pull down more data you simply fetch the URL again. But what if the website that minted that URL is no longer available? Or what if the website is still available but the original DNS registration expired, and someone is cybersquatting on it?

Admittedly some work has happened at the Dublin Core Metadata Initiative around the preservation of Linked Data vocabularies. The DCMI is taking a largely social approach to this problem, where vocabulary owners and memory institutions interact within the context of a particular trust framework centered on DNS. But the preservation of vocabularies (which are also identified with URLs) is really just a subset of the much larger problem of Web preservation more generally. Does web preservation have anything to offer for the preservation of Linked Data?

When reading the conversation Martin started I was reminded of a demo my colleague Chris Adams gave that showed how the World Digital Library item metadata can be retrieved from the Internet Archive. WDL embed item metadata as microdata in their HTML, and since the Internet Archive archives that HTML, you can get the metadata back from the Internet Archive.

So take this page from WDL:

Chola Woman

It turns out this particular page has been archived by the Internet Archive 27 times. So with a few lines of Python you can use Internet Archive as a metadata service:

import urllib import microdata


In case you missed it, an interesting study by Jonathan Zittrain and Kendra Albert was written up in the New York Times with the provocative title In Supreme Court Opinions, Web Links to Nowhere. In addition to the article, the study itself is worth reading for its compact review of the study of link rot on the Web, and its stunning finding that 49% of the links in US Supreme Court Opinions are broken.

This 49% is in contrast with a similar, recent study by Raizel Liebler and June Liebert of the same links, which found a much lower rate of 29%. The primary reason for this discrepancy was that Zittrain and Albert looked at reference rot in addition to link rot.

The term reference rot was coined by Rob Sanderson, Mark Phillips and Herbert Van de Sompel in their paper Analyzing the Persistence of Referenced Web Resources with Memento. The distinction is subtle but important. Link rot typically refers to when a URL returns an HTTP error of some kind that prevents a browser from rendering the referenced content. This error can be the result of the page disappearing, or the webserver being offline. Reference rot refers to when the URL itself seems to work (returning either a 200 OK or redirect of some kind), but the content that comes back is no longer the content that was being referenced.

The New York Times article includes a great example of reference rot. The website which was referenced in a Supreme Court Opinion by Justice Alito.


The DNS registration expired, and was picked up someone who knew its significance and turned it into an opportunity to educate people about links in legal documents. The NY Times article calls this nameless person a “prankster” but it is a wonderful hack

One thing the NY Times article didn’t mention is that the website has been captured 140 times by the Internet Archive and the original as referenced by Justice Alito is available still. It seemed like a missed opportunity to highlight the incredibly important work that Brewster Kahle and his merry band of Web archivists are doing. It would be interesting to see how many of the 555 extracted links are available in the Internet Archive. But I couldn’t seem to find the list in or linked to from the article.


Zittrain and Albert on the other hand do mention the Internet Archive’s work in the context of which is their proposed solution to the problem of broken links.

… the Internet Archive is dedicated to comprehensively archiving web content, and thus only passes through a given corner of the Internet occasionally, meaning there is no guarantee that a given page or set of content would be archived to reflect what an author or editor saw at the moment of citation. Moreover, the IA is only one organization, and there are long-term concerns around placing all of the Internet archiving eggs into one basket. A system of distributed, redundant storage and ownership might be a better long-term solution.

This seems like a legitimate concern, that there should be some ability to archive a website at a particular point in time. There are 27 founding members of There is a strong legal flavor to some of the participants, but doesn’t appear to be only for legal authors, the website states: helps authors and journals create permanent archived citations in their published work. will be free and open to all soon.

It’s good to see Internet Archive as one of the founding members. It remains to be seen what’s approach to a distributed, redundant storage will be. For the system to actually be distributed there has to be more to it than listing 27 organizations that agree that it’s a good idea. It’s not like Internet Archive operates on its own, since they work closely with the International Internet Preservation Consortium which has 44 organizational members, many of whom are national libraries. I didn’t see the IIPC on the list of founding members for

If were to take off I wonder what it would mean for publishers’ web analytics. If lots of publishers start putting URLs in their publications what would this mean for the publishers of the referenced content, and their web analytics? Would it be possible for publishers to see how often their content is being used on, and a rough approximation of who they are, what browsers they are using, etc?

Nit-picking aside, its awesome to see another player in the Web archiving space, especially from people Web-veterans who understand how it works, and its significance for society.

Update: Leigh Dodds has an excellent post about’s terms of service.

where the heart beats

Where the Heart Beats: John Cage, Zen Buddhism, and the Inner Life of ArtistsWhere the Heart Beats: John Cage, Zen Buddhism, and the Inner Life of Artists by Kay Larson My rating: 4 of 5 stars

I’m no expert on John Cage or Zen Buddhism, so I’m not a good person to speak to the accuracy of the material in this book. But Kay Larson provides a very accessible and inspired look at the life of an artist, who found peace and inspiration in the teachings of DT Suzuki, and how he went on to be a formative influence on postmodern art. The story of Cage’s relationship with Merce Cunningham and their inner circle of friends and artists was lovingly told. One of my favorite parts of the book was Larson’s discovery of a set of cards that were typed up for each meeting of “The Club”, which was a gathering of artists and thinkers in Greenwich Village . She used these postcards to piece together the chronology of Cage’s development around the time of his Lecture on Something and Lecture on Nothing. There are so many great Cage quotes scattered throughout the book too. I wish I read the book on my kindle so I could have highlighted more, and included some of them here. I’ve had a copy of Silence for years, and I think I’m going to reread some of it again, now that I know so much more about the context of John Cage’s life. If you’ve ever spent some time living in New York City, this book is bound to make you miss it just a little bit.

Passport Photos

Passport Photos

He would see faces in movies, on TV, in magazines, and in books. He thought that some of these faces might be right for him. And through the years, by keeping an ideal facial structure fixed in his mind, or somewhere in the back of his mind, that he might, by force of will, cause his face to approach those of his ideal.

The change would be very subtle. It might take ten years or so. Gradually his face would change its shape. A more hooked nose. Wider, thinner lips. Beady eyes. A larger forehead. He imagined that this was an ability he shared with most other people. They had also molded their faces according to some ideal. Maybe they imagined that their new face would better suit their personality. Or maybe they imagined that their personality would be forced to change to fit the new appearance. This is why first impressions are often correct.

Although some people might have made mistakes. They may have arrived at an appearance that bears no relationship to them. They may have picked an ideal appearance based on some childish whim, or momentary impulse. Some may have gotten halfway there, and then changed their minds. He wonders if he too might have made a similar mistake.

Seen and Not Seen by Talking Heads

metadata from Getty’s Open Content Program (part 2)

A few weeks ago I wrote a brief post about the embedded metadata found in images from the (awesome) Getty Open Content Program. This led to a useful exchange with Brenda Podemski on Twitter, which she gathered together with Storify. I promised her I would write another blog post that showed how the metadata could be expressed a little bit better.

It’s hard to read RDF as XML and Turtle isn’t for everyone, so here’s a picture of part of the XMP RDF metadata that is included in the highres download for a photo by Eugène Atget of a sculpture Bosquet de l’Arc de Triomphe by Jean-Baptiste Tuby. I haven’t portrayed everything in the file since it would clutter up the point I’m trying to make.

Original Description

Depicted here are two resources described in the RDF, the JPEG file itself and what the IPTC vocabulary calls an Artwork or Object. Now, it is good that the description distinguishes between the file and the photograph. The Dublin Core somewhat famously (in metadata circles) call this the One-To-One Principle. But notice how there is a dc:description attached to the file resource with lots of useful information concatenated together as a string? My question to Brenda was whether that string was actually available as structured data, and could it be expressed differently? Her response seemed to indicate that it was.

My suggestion is to unpack and move that concatenated string to describe the photograph, like so:

Unpacked description

Notice how the dimensions, format, type and were broken out into separate assertions about the photograph? I also quickly modified the description to use the Dublin Core vocabulary since it was more familiar to me. I wasn’t able to quickly find good properties for height and width, but I imagine they are out there somewhere, and if not there could be.

Of course, one could go further, and say there are really three resources: the file, the photograph, and the sculpture.

Added Sculpture

But this could be extra work for the Getty, if they don’t have this level of description yet. The half-step of enriching the description by indicating that it is a photograph of particular dimensions in a particular format seems like a useful thing to do for this example though, especially if they have that structured data already. My particular vocabulary choices (dc, foaf, etc) aren’t important compared to hanging the descriptions off of the right resources.

But, and this is a doozy of a but, it looks like from other metadata in the RDF that the metadata is being input with Photoshop. So while it is technically possible to embed this metadata in XMP as RDF, it is quite likely that Photoshop doesn’t give you the ability to enter it. In fact, it is fairly common for some image processing applications to strip parts or all of the embedded metadata. So to embed these richer descriptions into the files one might need to write a small program to do it.

There is another place where the metadata could be embedded though. What if the webpage for the item had embedded RDFa or Microdata in it that expressed this information? If they could commit to a stable URL for the item it would be a perfect place for both the human and machine readable description. All they would have to do would be to link the XMP metadata to it somehow, and adjust the templates they are using that drive the HTML display.

watermark woodcut indigo octavo

You know how sometimes you can get ideas for a subject you are interested in by studying a different but related subject? So, strangely enough, I’ve found myself reading about paper conservation. Specifically, at the moment, a book called Books and Documents: dating, permanence and preservation by Julius Grant. It was printed in 1937, so I guess a lot of the material is dated now (haha)…but somehow it’s only making it that much more interesting to read.

There are long sections detailing experiments on paper and ink to determine their composition, in order to roughly estimate when a document was likely to have been created. On pages 41-44 he provides a list of supplementary evidence that can be used.

There are of course many other minor sources of evidence which may prove helpful in establishing the date of a book or document, but to discuss them all in full detail would bring this volume outside its professed scope. It has, however, been thought desirable to summarize the more important of them in the form of a chronological table, which may be used in conjunction with the information on paper and ink already provided.

The list was so delightful, and oddly thought provoking, that I took the time to transcribe it below. I randomly linked some of the terms and names to Wikipedia to ensure you get completely lost.

Seventh century. The first bound books and introduction of quill pens.

863. The oldest printed book known (printed from blocks by Wang Chieh of Kansau, China).

1020. Beginning of the gradual transition from carbon to iron-gall writing inks.

1282. The earliest known watermark.

1307. Names of paper-makers first incorporated into watermarks.

1341. Invention of printing from movable type (by Pi Sheng) in China.

1400 (circ.). Introduction of alum-tanned white pig-skin bindings.

1440 (circ.). Invention of printing from movable type in Europe (Johann Gutenberg, Mainz).

1445-1500. Alternate light and dark striations in the look-through of paper due to construction of the mould.

1454. The first dated publication produced with movable type.

1457. The first book bearing the name of the printer.

1461. The first illustrated book (crude woodcuts).

1463. The first book with a title-page.

1465. The earliest blotting paper (vide infra, 1800) ; this is sometimes found in old books and manuscripts and its presence may help to date them, although of course, the blotting paper may have been inserted subsequently to the date of origin.

1470 (circ.). Great increase in the number of bound books produced, following the advent of printing ; vellum and leather used principally.

1470. The first book with pagination and headlines.

1472. The first book bearing printed signatures to serve as a guide to the binder.

1474. The first book published in English (by William Caxton, in Bruges).

1476. The first work printed in England (by William Caxton).

1483. The first double watermark.

1500. Introduction of the small octavo.

1500. Introduction of Italics.

1536. The first book printed in America.

1545 (circ.). Introduction of custom of using italics only for emphasis. Mineral oil and rosin first used in printing inks.

1560. Introduction of the sexto decimo.

1570. Introduction of the I2mo.

1570. Introduction of thin papers.

1575 (circ.). The first gold-tooling.

1580. Introduction of the modern forms of “i,” “j,” “u,” and “v.”

1580. (circ.). The first pasteboards.

1600 (circ.). Copper-plate illustration sufficiently perfected to replace crude woodcuts. Introduction of red morocco bindings.

1650. Wood covers (covered with silk, plush or tapestry) used for binding.

1670. Introduction of the hollander.

1720. Perfection of the vignette illustration.

1734. Caslon type introduced.

1750 (circ.). The first coth-backed paper (used only for maps).

1750 (circ.). Gradual disappearance of vellum for binding and introduction of millboard covered with calf; or half-covered with leather and half with marbled paper, etc. The first wove paper (Baskerville).

1763. Logwood inks probably first introduced.

1770. Indigo first used in inks (Eisler).

1780. Steel pens invented.

1796. The first lithographic machine.

1796. The first embossed binding.

1800. Blotting-paper in general use (vide supra, 1465) in England, following an accidental rediscovery at Hagbourne, Berkshire.

1803 (circ.). Metal pens first placed on the market.

1816 (circ.). Coloured inks first manufactured in England using pigments.

1820 (circ.). Linen-canvas first used instead of parchment to hold the back of the book into the cover. Introduction of straight-grained red morocco bindings (see 1600).

1820. The invention of modern type of metal nib.

1825. The first permanent photographic image (Niepce).

1830 (circ.). The first linen cover. Beginning of the era of poor leather bindings which have since deteriorated.

1830. Title printed on paper labels which were stuck on the cloth for the first time.

1835 (circ.). Decoration by machinery introduced.

1836. Introduction of iron-gall inks containing indigo (Stephens).

1839. Invention of photography (Daguerre).

1840. Titles first stamped on cloth.

1845. Linen board cover in common use. At about this time it became usual to trim the edges of books, and the practice of binding in quarter-leather declined.

1852. Invention of photogravure, leading to the development of lithographic etchings, colour prints, line engravings, etc. (Fox-Talbot).

1855. Cotton first used as a cover for binding-boards.

1856. Discovery of the first coal-tar dyestuff (Perkin’s mauve), leading to the use of such dyestuffs in coloured inks.

1860. Beginning of the custom of paring calf binding leathers to the thickness of paper.

1861. Introduction of synthetic indigo for inks.

1878. Invention of the stylographic pen.

1885. Invention of the half-tone process (F. E. Ives).

1905. The first offset litho press.

More about the other subject later …