on “good” repositories

Chris Rusbridge kicked off an interesting thread on JISC-REPOSITORIES with the tag line What makes a good repository? Implicit in this question, and perhaps the discussion list, is that he is asking about “digital” repositories, and not the brick n’ mortar libraries, archives, etc that are arguably also repositories.

The question of what a repository is, is pretty much verboten in the group I work in. This is kind of amusing since I work in a group whose latest name (in a string of names, and no-names) is the Repository Development Center. Well, maybe saying it’s “verboten” is putting it a bit too strongly. It’s not as if the “repository” is equivalent to He Who Shall Not Be Named or anything. It’s just that the word means so many things, to so many different people, and encompasses so much of what we do, that it’s hardly worth talking about. At our best (IMHO) we focus on what staff and researchers want to do with digital materials, and building out services that help them do that. Getting wrapped around the axle about what set of technologies we are using, and whether they model data in a particular way, is putting the cart before the horse.

As Dan penned one April 1st: “if you seek a pleasant repository, look about you”. I guess this largely depends on where you are sitting. But seriously, if there’s one thing that the Trustworthy Repositories Audit & Certification: Criteria and Checklist, the Ten Principles for Digital Preservation Repositories, and the Blue Ribbon Task Force on Sustainable Digital Preservation and Access make abundantly clear (after you’ve clawed out your eyes) it’s that the fiscal and social dimension of repositories are a whole lot more important in the long run than the technical bits of how a repository is assembled in the now. I’m a software developer, and by nature I reach for technical solutions to problems, but in my heart of hearts I know it’s true.

Back to Chris’ question. Perhaps the “digital” is a red-herring. What if we consider his question in light of traditional libraries? This got me thinking: could Ranganthan and his Five Laws of Library Science serve as a touchstone? Granted, bringing Ranganathan into library discussions is a bit of a cliché. But asking ethical questions like the “goodness” of something is a great excuse to dip into the canon. So put on your repository colored glasses, which magically substitute Repository Object for Book, and …

Repository Objects Are For Use

We can build repositories that function as dark archives. But it kind of rots your soul to do it. It rots your soul because no matter what awesome technologies you are using to enable digital preservation in the repository, the repository needs to be used by people. If it isn’t used, the stuff rots. And the digital stuff rots a whole lot faster than the physical materials. Your repository should have a raison d’être. It should be anchored in a community of people that want to use the materials that it houses. If it doesn’t the repository is likely to suck not be good.

Every Reader His/Her Repository Object

Depending on their raison d’être (see above) repositories are used by a wide variety of people: researchers, administrators, systems developers, curators, etc. It does a disservice to these people if the repository doesn’t support their use cases. A researcher probably doesn’t care when fixity checks were last performed, and an administrator generating a report on fixity checks doesn’t care about how an repository object was linked to and tagged in Twitter. Does your repository allow these different views, for different users to co-exist for the same object? Does it allow new classes of users to evolve?

Every Repository Object Its Reader

Are the objects in your repository discoverable? Are there multiple access pathways to them? For example, can someone do a search in Google and wind up looking at an item in your repository? Can someone link to it from a Wikipedia article? Can someone do a search within your repository to find an object of interest? Can they browse a controlled vocabulary or subject guide to find it? Are repository objects easily identified and found by automated agents like web crawlers and software components that need to audit them? Is it easy to extend, enhance and refine your description of what the repository object is as new users are discovered?

Save the Time of the Reader

Is your repository collection meaningfully on the Web? If it isn’t, it should be, because that’s where a lot of people are doing research today…in their web browser. If it can’t be open access on the web, that’s OK … but the collection and its contents should be discoverable so that someone can arrange an onsite visit. For example, can a genealogist do a search for a person’s name in a search engine and end up in your repository? Or do they have to know to come to your application to type in a search there? Once they are in your repository can they easily limit their search along familiar dimensions such as who, what, why, when, and where? Is it easy for someone to bookmark a search, or an item for later use. Do you allow your repository objects to be reused in other contexts like Facebook, Twitter, Flickr, etc which put the content where people are, instead of expecting them to come to you?

The Repository is a Growing Organism

This is my favorite. Can you keep adding numbers and types of objects, and scale your architecture linearly? Or are you constrained in how large the repository can grow? Is this constraint technical, social and/or financial? Can your repository change as new types or numbers of users (both human and machine) come into existence? When the limits of a particular software stack are reached, is it possible to throw it away and build another without losing the repository objects you have? How well does your repository fit into the web ecosystem? As the web changes do you anticipate your repository will change along with it? How can you retire functionality and objects; to let them naturally die, with respect, and make way for the new?

So …

I guess there are more questions here than answers. I hadn’t thought of framing repository questions in terms of Ranganathan’s laws before, but I imagine it has occurred to other people before. They still seem to be quite good principles to riff on, even in the digital repository realm–at least for a blog post. If you happen to run across similar treatment elsewhere I would appreciate hearing about them.