Tag Archives: archives

Inside Out Libraries

Peter Brantley tells a sad tale about where public library leadership is at, as we plunge headlong into the ebook future, that has been talked about for what seems like forever, and which is now upon us. It’s not pretty.

The general consensus among participants was that public libraries have two, maybe three years to establish their relevance in the digital realm, or risk fading from the central place they have long occupied in the world’s literary culture.

The fact that a bunch of big-wigs invited by IFLA were seemingly unable to find inspiration and reason to hope that public libraries will continue to exist is not surprising in the least I guess. I’m not sure that libraries were ever the center of the world’s literary culture. But for the sake of argument lets assume they were, and that now they’re increasingly not. Let us also assume that the economic landscape around ebooks is in incredible turmoil, and that there will continue to be sea changes in technologies, and people’s use of them in this area for the foreseeable future.

What can libraries do to stay relevant? I think part of the answer is: stop being libraries…well, sorta.

The HyperLocal

The most serious threat facing libraries does not come from publishers, we argued, but from e-book and digital media retailers like Amazon, Apple, and Google. While some IFLA staff protested that libraries are not in the business of competing with such companies, the library representatives stressed that they are. If public libraries can’t be better than Google or Amazon at something, then libraries will lose their relevance.

In my mind the thing that libraries have to offer, which these big corporations cannot, is authentic, local context for information about a community’s past, present and future. But in the past century or so libraries have focused on collecting mass produced objects, and sharing data about said objects. The mission of collecting hyper-local information has typically been a side task, that has fallen to special collections and archives. If I were invited to that IFLA meeting I would’ve said that libraries need to shift their orientation to caring more about the practices of archives and manuscript collections, by collecting unique, valued, at risk local materials, and adapting collection development and descriptive practices to the realities of more and more of this information being available as data.

As Mark Matienzo indicated (somewhat indirectly in Twitter) after I published this blog post, a lot of this work involves focusing less on hoarding items like books, and focusing more on the functions, services, and actions that public libraries want to document and engage with in their communities. Traditionally this orientation has been a strength area for archivists in their practice and theory of appraisal where:

… considerations … include how to meet the record-granting body’s organizational needs, how to uphold requirements of organizational accountability (be they legal, institutional, or determined by archival ethics), and how to meet the expectations of the record-using community. Wikipedia

I think this represents a pretty significant cognitive shift for library professionals, and would in fact take some doing. But perhaps that’s just because my exposure to archival theory in “library school” was pretty pathetic. Be that as it may here are some practical examples of growth areas for public libraries that I wish came up at the IFLA meeting.

Web Archiving

The Internet Archive and national libraries that are part of the International Internet Preservation Consortium don’t have the time, resources and often mandate to collect web content that are of interest at the local level. What if the tooling and expertise existed for public libraries to perform some of this work, and to have the results fed into larger aggregations of web archives?

Municipality Reports and Data

Increasing amounts of data are being collected as part of the daily working of our local governments. What if your public library had the resources to be a repository for this data? Yeah, I said the R word. But I’m not suggesting that public libraries get the expertise to set up Fedora instances with Hydra heads, or something. I’m thinking about approaches to allowing data to easily flow into an organization, where it is backed up, and made available in a clearinghouse manner similar to public.resource.org on the Web, for search engines to pick up. Perhaps even services like LibraryBox offer another lens to look at the opportunities that lie in this area.

Born Digital Manuscript Collections

Public libraries should be aggressively collecting the “papers” of local people who have had significant contributions to their communities. Increasingly, these aren’t paper at all, but are born digital content. For example: email correspondence, document archives, digital photograph collections. I think that librarians and archivists know, in theory, that this born digital content is out there, but the reality is it’s not flowing into the public library/archive. How can we change this? Efforts such as Personal Digital Archiving are important for two reasons: they help set up the right conditions for born digital collections to be donated, and they also make professionals think about how they would like to receive materials so that they are easier to process. Think more things like AIMS, training and tooling for both professionals and citizens.


It’s not unusual for archives and special collections to have all sorts of donor gift agreements that place restrictions on how their donated materials can be used. To some extent needing to visit the collection, request it, and not being able to leave the room with it, has mitigated some of this special-snowflakism. But when things are online things change a bit. We need to normalize these agreements so that content can flow online, and be used online in clearer ways. What if we got donors to think about Creative Commons licenses when they donated materials? How can we make sure donated material can become a usable part of the Web


We all know that things come and go on the Web. But it doesn’t need to be that way for everything on the Web. Libraries and archives have an opportunity to show how focusing on being a clearninghouse for data assets can allow for things to live persistently on the Web. Thinking about our URLs as identifiers for things we are taking care of is important. Practical strategies for achieving that are possible, and repeatable. What if public libraries were safe harbors for local content on the World Wide Web? This might sound hard to do, but I think it’s not as hard as people think.


As libraries/archives make more local content available publicly on the Web it becomes important to track how this content is accessed and used online. Quick wins like Web analytics tools (Google Analytics) for seeing what is being accessed and from where. Seeing how content is cited in social media applications like Facebook, Twitter, Pinterest and Wikipedia is important for reporting on the value of online collections. But encouraging professionals to use this information to become part of the conversations is equally important. Good metrics are also essential for collection development purposes, seeing what content is of interest, and what is not.

Inside Out Libraries

So, no I don’t think public libraries need a new open source Overdrive. The ebook market will likely continue to take care of itself. I also am not really convinced we need some overarching organization like the Digital Public Library of America to serve as a single point of failure when the funding runs dry. We need distributed strategies for documenting our local communities, so that this information can take its rightful place on the Web, and be picked up by Google so that people can find it when they are on the other side of the world. Things will definitely keep changing, but I think libraries and archives need to invest in the Web as an enduring delivery platform for information.

I’ve never been before but I was so excited to read the call for the European Library Automation Group (ELAG) this year.

The theme of this year’s conference is ‘The INSIDE-OUT Library’. This theme was chosen at last year’s conference, because we concluded:

  • Libraries have been focusing on bringing the world to their users. Now information is globally available.
  • Libraries have been producing metadata for the same publications in parallel. Now they are faced with deduplicating redundancy.
  • Libraries have been selecting things for their users. Now the users select things themselves.
  • Libraries have been supporting users by indexing things locally. Now everything is being indexed in global, shared indexes.

Instead of being an OUTSIDE-IN library, libraries should try and stay relevant by shifting their paradigm 180 degrees. Instead of only helping users to find what is available globally, they should also focus on making local collections and production available to the world. Instead of doing the same thing everywhere, libraries should focus on making unique information accessible. Instead of focusing on information trapped in publications, libraries should try and give the world new views on knowledge.

This blog post is really just a somewhat shabby rephrasing of that call. Maybe IFLA could use some of the folks on the ELAG program commmittee at their next meeting about the future of public libraries? Hopefully 2013 will be a year I can make it to ELAG.

I expect public libraries will continue to exist, but there isn’t going to be some magical technical solution to their problems. Their future will be forged by each local relationship they make, which leads to them better documenting their place on the Web. We may not call these places public libraries at first, but that’s what they will be.

Wikimania Revisited

I recently attended the Wikimania conference here in Washington, DC. I really can’t express how amazing it was to be a Metro ride away from more than 1,400 people from 87 countries who were passionate about creating a world in which every single human being can freely share in the sum of all knowledge. It was my first Wikimania, and I had pretty high expectations, but I was blown away by the level of enthusiasm and creativity of the attendees. Since my employer supported me by allowing me to spend the week there, I thought I would jot down some notes about the things that I took from the conference, from the perspective of someone working in the cultural heritage sector.


Of course the big news from Wikimania for folks like me who work in libraries and archives was the plenary speech by the Archivist of the United States, David Ferriero. Ferriero did an excellent job of connecting NARA’s mission to that of the Wikipedia community. In particular he stressed that NARA cannot solve difficult problems like the preservation of electronic records without the help of open government, transparency and citizen engagement to shape its policies and activities. As a library software developer I’m as interested as the next person in technical innovations in the digital preservation space: be they new repository software, flavors of metadata and digital object packaging, web services and protocols, etc. But over the past few years I’ve been increasingly convinced that access to the content that is being preserved is an absolutely vital ingredient to its preservation. If open access (as in the case of NARA) isn’t possible due to licensing concerns, then it is still essential to let access by some user community drive and ground efforts to collect and preserve digital content. Seeing high level leadership in the cultural heritage space (and from the federal government no less) address this issue was really inspiring.

At the Archives our concepts of openness and access are embedded in our mission. The work we do every day is rooted in the belief that citizens have the right to see, examine, and learn from the records that guarantee citizens rights, document government actions, and tell the story of our nation.

My biggest challenge is visibility: not everyone knows who we are, what we do, or more importantly, the amazing resources we collect and house. The lesson I learned in my time in New York is that it isn’t good enough to create great digital collections, and sit back and expect people to find you. You need to be where the people are.

The astounding thing is that it’s not just talk–Ferriero went on to describe several efforts of how the Archives is executing on collaboration with the Wikipedia community, which is also documented at a high level in NARA’s Open Government Plan. One example that stood out for me was NARA’s Today’s Document website which highlights documents from its collections. On June 1st, 2011 they featured a photograph of Harry P. Perry who was the first African American to enlist in the US Marine Corps after it was desegregated on June 1st, 1942. NARA’s Wikipedian in Residence Dominic McDevitt-Parks’ efforts to bring archival content to the attention of Wikipedians resulted in a new article Desegregation in the United States Marine Corps being created that same day…and the photograph on NARA’s website was viewed more than 4 million times in 8 hours. What proportion of the web traffic was driven by Wikipedia specifically rather than other social networking sites wasn’t exactly clear, but the point is that this is what happens when you get your content where the users are. If my blog post is venturing into tl;dr territory, please be sure to at least watch his speech, it’ll just take 20 minutes.

Resident Wikipedians

In a similar vein Sara Snyder made a strong case for the use of archival materials on Wikipedia in her talk 5 Reasons Why Archives are an Untapped Goldmine for Wikimedians. She talked about the work that Sarah Stierch did as the Wikipedia in Residence at the Smithsonian Archives of American Art. The partnership resulted in ~300 WPA images being uploaded to Wikimedia Commons, 37 new Wikipedia articles, and new connections with a community of volunteers who participated in edit-a-thons to improve Wikipedia and learn more about the AAA collections. She also pointed out that since 2010 Wikipedia has driven more traffic to the Archives of American Art website than all other social media combined.

In the same session Dominic McDevitt-Parks spoke about his activities as the Wikipedian in Residence at the US National Archives. Dominic focused much of his presentation on NARA’s digitization work, largely done by volunteers, the use of Wikimedia Commons as a content platform for the images, and ultimately WikiSource as a platform for transcribing the documents. The finished documents are then linked to from NARA’s Online Catalog, as in this example: Appeal for a Sixteenth Amendment from the National Woman Suffrage Association. NARA also prominently links out to the content waiting to be transcribed at WikiSource on its Citizens Archivist Dashboard. If you are interested in learning more, Dominic has written a bit about the work with WikiSource on the NARA blog. Both Dominic and Sara will be speaking next month at the Society of American Archivists Annual Meeting making the case for Wikipedia to the archival community. Their talk is called 80,000 Volunteers Can’t Be Wrong: The Case for Greater Collaboration with Wikipedia, and I encourage you attend if you will be at SAA.

The arrival of Wikipedians in Residence is a welcome seachange in the Wikipedia community, where historically there had been some uncertainty about the best way for cultural heritage organizations to highlight their original content in Wikipedia articles. As Sara pointed out in her talk, it helps both sides (the institutional side, and the Wikipedia side) to have an actual, experienced Wikipedian on site to help the organization understand how they want to engage the community. Having direct contact with archivists, curators and librarians that know their collections backwards and forwards also helps the resident in knowing how to direct their work, and the work of other Wikipedians. The Library of Congress made an announcement at the Wikimania reception that the World Digital Library are seeking a Wikipedia in Residence. I don’t work directly on the project anymore, but I know people who do, so let me know if you are interested and I can try to connect the dots.

I think in a lot of ways the residency program is an excellent start, but really it’s just that–a start. The task at hand of connecting the Wikipedia community and article content with the collections of galleries, libraries, archives and museums is a huge one. One person, especially a temporary volunteer, can only do so much. As you probably know, Wikipedia editors can often be found embedded in cultural heritage organizations. It’s one of the reasons why we started having informal Wikipedia lunches at the Library of Congress: to see what can be done at the grass roots level by staff to integrate Wikipedia into our work. When we started to meet I learned about an earlier, 4 year old effort to create a policy that provides guidance to staff about how to interact with the Wikipedia community as editors. Establishing a residency program is an excellent way to signal a change in institutional culture, and to bootstrap and focus the work. But I think the residencies also highlight the need to empower staff throughout the organization to participate as well, so that after the resident leaves the work goes on. In addition to establishing a WDL Wikipedian in Residence I would love to see the Library of Congress put the finishing touches on its Wikipedia policy that would empower staff to use and contribute to Wikipedia as part of their work, without lingering doubt about whether it was correct or not. It would probably be helpful for other organizations to publish theirs as examples for other organizations wanting the same.

Wikipedia as a Platform

Getting back to Wikimania, I wanted to highlight a few other GLAM related projects that use Wikipedia as a platform.

Daniel Mietchen spoke about work he was doing around the Open Access Media Importer (OAMI). The OAMI is a tool that harvests media files (images, movies, etc) from open access materials and uploads them to Wikimedia Commons for use in article content. Efforts to date have focused primarily on PubMed from the National Institutes of Health. As someone working in the digital preservation field one of the interesting outcomes of the work so far was a table that illustrated the media formats present in PubMed:

Since Daniel and other OAMI collaborators are scientists they have been focused primarily on science related media…so they naturally are interested in working with arXiv. arXiv is a heavily trafficked, volunteer supported, pre-print server, that is normally a poster child for open repositories. But one odd thing about arXiv that Daniel pointed out is that while arXiv collects licensing information from authors as part of deposit, they do not indicate in the user interface which license has been used. This makes it particularly difficult for the OAMI to determine which content can be uploaded to the Wikimedia Commons. I learned from Simeon Warner shortly afterwards that while the licensing information doesn’t show up in the UI currently, and isn’t present in all the metadata formats that their OAI-PMH service provides, it can be found squirreled away in the arXivRaw format. So it should be theoretically possible to modify the OAMI to use arXivRaw.

Another challenges the OAMI faces is extraction of metadata. For example media files often don’t share all the subject keywords that are appropriate for the entire article. So knowing which ones to apply can be difficult. In addition, metadata extraction from Wikimedia Commons was reported to not be optimal, since it involves parsing mediawiki templates, which limits the downstream use of the content added to the Commons. I don’t know if the Public Library of Science is on the radar for harvesting, but if it isn’t it should be. The OAMI work also seems loosely related to the issue of research data storage and citation which seems to be on the front burner for those interested in digital repositories. Jimmy Wales has reportedly been advising the UK government on how to making funded research available to the public. I’m not sure if datasets fit the purview of the Wikimedia Commons, but since Excel is #3 in the graph above perhaps it is. It might be interesting to think more about Wikimedia Commons as a platform for publishing (and citing) datasets.

I learned about another interesting use of the Wikimedia Commons from Maarten Dammers and Dan Entous during their talk about the GLAMwiki Toolset. The project is a partnership between Wikipedia Netherlands and Europeana. If you aren’t already familiar with Europeana it is an EU funded effort to enhance access to European cultural heritage material on the Web. The project is just getting kicked off now, and is aiming to:

…develop a scalable, maintainable, ease to use system for mass uploading open content from galleries, libraries, archives and museums to Wikimedia Commons and to create GLAM-specific requirements for usage statistics.

Wikimedia Commons can be difficult to work with in an automated, batch oriented way for a variety of reasons. One that was mentioned above is metadata. The GLAMwiki Toolset will provide some mappings from commonly held metadata formats (starting with Dublin Core) to Commons templates, and will provide a framework for adapting the tool to custom formats. Also there is a perceived need for tools to manage batch imports as well as exports from the Commons. The other big need are usable analytics tools that let you see how content is used and referenced on the Commons once it has been uploaded. Maarten indicated that they are seeking participation in the project from other GLAM organizations. I imagine that there are other organizations that would like to use the Wikimedia Commons as a content platform, to enable collaboration across institutional boundaries. Wikipedia is one of the most popular destinations on the Web, so they have been forced to scale their technical platform to support this demand. Even the largest cultural heritage organizations can often find themselves bound to somewhat archaic legacy systems, that can make it difficult to similarly scale their infrastructure. I think services like Wikimedia Commons and WikiSource have a lot to offer cash strapped organizations that want to do more to provide access to their organizations unique materials on the Web, but are not in a position to make the technical investments to make it happen. I’m hoping that efforts like the GLAMWiki toolset will make this easier to achieve, and is something I personally would like to get involved in.

Incidentally, one of the more interesting technical track talks I attended was a talk by Ben Hartshorne from the Wikimedia Foundation Operations Team, about their transition from NFS to Openstack Swift for media storage. I had some detailed notes about this talk, but proceeded to lose them. I seem to remember that in total, the various Wikimedia properties amount to 40T of media storage (images, videos, etc), and they want to be able to grow this to 200T this year. Ben included lots of juicy details about the hardware and deployment of Swift in their infrastructure, so I’ve got an email out to him to see if he can share his slides (update: he just shared them, thanks Ben!). The placement of various caches (Swift is an HTTP REST API), as well as the hooks into MediaWiki were really interesting to me. The importance of URL addressable object storage for bitstreams in an enterprise that is made up of many different web applications can’t be overstated. It was also fun to hear about the impact that digitization projects like Wikipedia Loves Monuments and the NARA work mentioned above, are having on the backend infrastructure. It’s great to hear that Wikipedia is planning for growth in the area of media storage, and can scale horizontally to meet it, without paying large sums of money for expensive, proprietary, vendor supplied NAS solutions. What wasn’t entirely clear from the presentation is whether there is a generic tipping point where investing in staff and infrastructure to support something like Swift becomes more cost-effective than using a storage solution like Amazon S3. Ben did indicate that there use of Swift and the abstractions they built into Mediawiki would allow for using storage APIs like S3.

Before I finish this post, there were a couple other Wikipedia related topics that I didn’t happen to see discussed at Wikimania (it’s a multi-track event so I may have just missed it). One is the topic of image citation on Wikipedia. Helena Zinkham (Chief of the Prints and Photographs Division at the Library of Congress) recently floated a project proposal at LC’s Wikipedia Lunch to more prominently place the source of an image in Wikipedia articles. For an example of what Helena is talking about take a look at the article for Walt Whitman: notice how the caption doesn’t include information about where the image came from? If you click on the image you get a detail page that does indicate that the photograph is from LC’s Prints & Photographs collection, with a link back to the Prints & Photographs Online Catalog. I agree with Helena that more prominent information about the source of photographs and other media in Wikipedia could encourage more participation from the GLAM community. What the best way to proceed with the idea is still in question. I’m new to the way projects get started and RFCs work there. Hopefully we will continue to work on this in the context of the grassroots Wikipedia work at LC. If you are interested please drop me an email

Another Wikipedia project directly related to my $work is the Digital Preservation WikiProject that the National Digital Stewardship Alliance is trying to kickstart. One of the challenges of digital preservation is the identification of file formats, and their preservation characteristics. English Wikipedia currently has 325 articles about Computer File Formats, and one of the goals of the Digital Preservation project is to enhance these with predictable infoboxes that usefully describe the format. External data sources such as PRONOM and UDFR also contain information about data formats. It’s possible that some of them could be used to improve Wikipedia articles, to more widely disseminate digital preservation information. Also, as Ferriero noted, it’s important for cultural heritage organizations to get their information out to where the people are. Jason Scott of ArchiveTeam has been talking about a similar project to aggregate information about file formats to build better tools for format identification. While I can understand the desire to build a new wiki to support this work, and there are challenges to working with the Wikipedia community, I think Linus’ Law points the way to using Wikipedia.


So, I could keep going, but in the interests of time (yours and mine) I have to wrap this Wikimania post up (for now). Thanks for reading this far through my library colored glasses. Oddly I didn’t even get to mention the most exciting and high profile Wikidata and Visual Editor projects that are under development, and are poised to change what it means to use and contribute to Wikipedia for everyone, not just GLAM organizations. Wikidata is of particular interest to me because if successful it will bring many of the ideas of the Linked Data to solve an eminently practical problem that Wikipedia faces. In some ways the WikiData project is following in the footsteps of the successful dbpedia and Google Freebase projects. But there is a reason why Freebase and Dbpedia have spent time engineering their Wikipedia updates–because it’s where the users are creating content. Hopefully I’ll be able to attend Wikimania next year to see how they are doing. And I hope that my first Wikimania marks the beginning of a more active engagement in what Wikipedia is doing to transform the Web and the World.

meeting notes and a manifesto

A few weeks ago I was in sunny (mostly) Palo Alto, California at a Linked Data Workshop hosted by Stanford University, and funded by Council on Library and Information Resources. It was an invite-only event, with very little footprint on the public Web, and an attendee list that was distributed via email with “Confidential PLEASE do not re-distribute” across the top. So it feels more than a little bit foolhardy to be writing about it here, even in a limited way. But I was going to the meeting as an employee of the Library of Congress, so I feel a bit responsible for making some kind of public notes/comments about what happened. I suspect this might impact future invitations to similar events, but I can live with that :-)

A week is a long time to talk about anything…and Linked Data is certainly no exception. The conversation was buoyed by the fact that it was a pretty small group of around 30 folks from a wide variety of institutions including: Stanford University, Bibliotheca Alexandrina, Bibliothèque nationale de France, Aalto University, Emory University, Google, National Institute of Informatics, University of Virginia, University of Michigan, Deutschen Nationalbibliothek, British Library, Det Kongelige Bibliotek, California Digital Library and Seme4. So the focus was definitely on cultural heritage institutions, and specifically what Linked Data can offer them. There was a good mix of people in attendance: some who were relatively new to Linked Data and RDF, to others who had been involved in the space before the term Linked Data, RDF or the Web existed…and there were people like me who were somewhere in between.

A few of us took collaborative notes in PiratePad, which sort of tapered off as the week progressed and more discussion happened. After some initial lighting-talk-style presentations from attendees on Linked Data projects they were involved in, we spent most of the rest of the week breaking up into 4 groups, to discuss various issues, and then joining back up with the larger group for discussion. If things go as planned you can expect to see a report from Stanford that covers the opportunities and challenges that Linked Data offers the cultural heritage sector, which were raised during these sessions. I think it’ll be a nice compliment to the report that the W3C Library Linked Data Incubator Group is preparing, which recently became available as a draft.

One thing that has stuck with me a few weeks later, is the continued need in the cultural heritage Linked Data sector for reconciliation services, that help people connect up their resources with appropriate resources that other folks have published. If you work for a large organization, there is often even a need for reconciliation services within the enterprise. For example the British Library reported that it has some 300 distinct data systems within the organization, that sometimes need to be connected together. Linking is the essential ingredient, the sine qua non of Linked Data. Linking is what makes Linked Data and the RDF data model different. It helps you express the work you may have done in joining up your data with other people’s data. It’s the 4th design pattern in Tim Berners-Lee’s Linked Data Design Issues:

Include links to other URIs. so that they can discover more things.

But, expressing the links is the easy part…creating them where they do not currently exist is harder.

Fortunately, Hugh Glaser was on hand to talk about the role that sameas.org plays in the Linked Data space, and how the RKBexplorer managed to reconcile authors names across institutional repositories. He also has described some work with the British Museuem linking up two different data silos about museum objects, to provide a unified web views for those objects. Hugh, if you are reading this, can you comment with a link to this work you did, and how it surfaces in British Museum website?

Similarly Stefano Mazzocchi talked about how Google’s tools like Freebase Suggest and their WebID app can make it easy to integrate Freebase’s identity system into your applications. If you are building a cataloging tool, take a serious look at what using something like Freebase Suggest (a jquery plugin) can offer your application. In addition, as part of the Google Refine data cleanup tool, Google has created an API for data reconciliation services, which other service providers could supply. Stefano indicated that Google was considering releasing the code behind this reconciliation service, and stressed that it is useful for the community to make more of these reconciliation services available, to help others link their data with other peoples data. It seems obvious I guess, but I was interested to hear that Google themselves are encouraging the use of Freebase IDs to join up data within their enterprise.

Almost a year ago Leigh Dodds created a similar API layer for data that is stored in the Talis Platform. Now that the British National Bibliography is being made available in a Talis Store, it might be possible to use Leigh’s code to put a reconciliation service on top of that data. Caveat being that not all the BNB is currently available there. By the way, hats off to the British Library for iteratively making that data available, and getting feedback early, instead of waiting for it all to be “done”…which of course they never will be, if they are successful at integrating Linked Data into their existing data work flows.

If you squint right, I think it’s also possible to look at the VIAF AutoSuggest service as a type reconciliation service. It would be useful to have a similar service over the Library of Congress Subject Headings at id.loc.gov. Having similar APIs for these services could be a handy thing as we begin to build new descriptive cataloging tools that take advantage of these pools of data. But I don’t think it’s necessarily essential, as the APIs could be orchestrated in a more ad hoc, web2.0 mashup style. I imagine I’m not alone in thinking we’re now at the stage when we can start building new cataloging tools that take advantage of these data services. Along those lines Rachel Frick had an excellent idea to try to enhance collection building applications like Omeka and Archives Space to take advantage of reconciliation services under the covers. Adding a bit of suggest-like functionality to these tools could smooth out the description of resources that libraries, museums and archives are putting online. I think the Omeka-LCSH plugin is a good example of steps in this direction.

One other thing that stuck with me from the workshop is that the new (dare I say buzzwordy) focus on Library Linked Data is somewhat frightening to library data professionals. There is a lot of new terminology, and issues to work out (as the Stanford report will hopefully highlight). Some of this scariness is bound up with the Resource Description and Access sea change that is underway. But crufty as they are, data systems built around MARC have served the library community well over the past 30 years. Some of the motivations for Linked Data are specifically for Linked Open Data, where the linking isn’t as important as the openness. The LOD-LAM summit captured some of this spirit in the 4 Star Classification Scheme for Linked Open Cultural Metadata, which focuses on licensing issues. There was a strong undercurrent at the Stanford meeting about licensing issues. The group recognized that explicit licensing is important, but it was intentionally kept out of scope of most of the discussion. Still I think you can expect to see some of the heavy hitters from this group exert some influence in this arena to help bring clarity to licensing issues around our data. I think that some of the ideas of opening up the data, and disrupting existing business workflows around the data can seem quite scary to those who have (against a lot of odds) gotten them working. I’m thinking of the various cooperative cataloging efforts that allow work to get done in libraries today.

Truth be told, I may have inspired some of the “fear” around Linked Data by suggesting that the Stanford group work on a manifesto to rally around, much like what the Agile Manifesto did for the Agile software development movement. I don’t think we had come to enough consensus to really get a manifesto together, but on the last day the sub-group I was in came up with a straw man (near the bottom of the piratepad notes) to throw darts at. Later on (on my own) I kind of wordsmithed them into a briefer list. I’ll conclude this blog post by including the “manifesto” here not as some official manifesto of the workshop (it certainly is not), but more as a personal manifesto, that I’d like to think has been guiding some of the work I have been involved in at the Library of Congress over the past few years:

Manifesto for Linked Libraries

We are uncovering better ways of publishing, sharing and using information by doing it and helping others do it. Through this work we have come to value:

  • Publishing data on the Web for discovery over preserving it in dark archives.
  • Continuous improvement of data over waiting to publish perfect data.
  • Semantically structured data over flat unstructured data.
  • Collaboration over working alone.
  • Web standards over domain-specific standards.
  • Use of open, commonly understood licenses over closed, local licenses.

That is, while there is value in the items on the right, we value the items on the left more.

The manifesto is also on a publicly editable Google Doc; so if you feel the need to add or comment please have a go. I was looking for an alternative to “Linked Libraries” since it was not inclusive of archives and museums … but I didn’t spend much time obsessing on it. One of the selfish motivations for publishing the manifesto here was to capture it a particular point in time where I was happy with it :-)