Category Archives: libraries

aaronsw

Aaron Swartz left us all a week ago. It’s strange, I only met Aaron once at the Internet Archive, and had a handful of conversations with him via email/irc … but not a day has passed since last Saturday that I haven’t thought about him, and his principled life.

I’ve been asked a few times why Aaron has been on my mind so much, and I’ve struggled to put it into words. Meanwhile, so many thoughtful things have been written about him. The arc of his life, his ideals, and abilities, charisma, and chutzpah, seem larger than life. And yet, he was just a person, a son, a friend, with people who loved him. It’s just heartbreaking.


I work as a software developer in libraryland, trying to bridge the world of information we’ve had with the world we are building on the Web. So for me, Aaron was a role model, a teacher whose lessons weren’t in textbooks or scholarly journals, but in his blog, in his code, in his talks, in his experiments with real world results. He was only 26 when he died, but he was, and remains, as Tim Berners-Lee paradoxically called him, a “wise elder”.

I wanted to write something here, but more than that I wanted to do something.


I noticed that Internet Archive created a collection devoted to online material related to Aaron, and thought I would try to collect together all the Twitter conversations that mention him. Twitter’s search is limited to the last week, so I quickly wrote a command line utility that pages through search results using their API, and writes out the complete data as line-oriented JSON. I also pulled in the tweets that mention #pdftribute since they were largely inspired by Aaron’s efforts in the open access space. I packaged up the data using BagIt and put it up at Internet Archive. Here’s the description from the bag-info.txt

On January 11, 2013 the Internet activist Aaron Swartz took his own life, and a great deal of grief, anger, and constructive thinking erupted on the Web and in Twitter. In particular the #pdftribute Twitter tag was born, in an attempt to raise awareness about Open Access issues, that Aaron did so much to futher during his life.

This package contains Twitter JSON data for two Twitter search queries that were collected in the week following Aaron’s death:

  • “Aaron Swartz” OR aaronsw
  • #pdftribute

aaronsw.json.gz contains 630,397 tweets, for the period starting with 2013-01-11 16:50:22 and ending 2013-01-18 13:50:02.

pdftribute.json.gz contains 42,277 tweets, for the period starting with Jan 13 02:42:26 and ending Jan 17 03:33:46.

In addition the URLs mentioned in the tweets found in aaronsw.tar.gz were extracted, unshortened, and then aggregated to provide a report of what people linked to. These URLs are available in aaronsw-urls.txt.gz.

It is hoped that this data will help document the Web community’s response to Aaron’s death, and life.

Below is a list of the top 50 links shared in tweets about Aaron. There were 36,506 in all.

Page Shares
RIP, Aaron Swartz – Boing Boing 11763
The Truth about Aaron Swartz’s “Crime” « Unhandled Exception 6641
Aaron Swartz commits suicide – The Tech 5539
Remove United States District Attorney Carmen Ortiz from office for overreach in the case of Aaron Swartz. 6478
Prosecutor as bully – Lessig Blog 3738
The inspiring heroism of Aaron Swartz | Glenn Greenwald | Comment is free | guardian.co.uk 2522
Aaron Swartz Faced A More Severe Prison Term Than Killers, Slave Dealers And Bank Robbers | ThinkProgress 2367
Farewell to Aaron Swartz, an Extraordinary Hacker and Activist – EFF 2042
Internet Activist, a Creator of RSS, Is Dead at 26, Apparently a Suicide – New York Times 1927
Aaron Swartz muere por suicidio a sus 26 años 1572
Technology’s Greatest Minds Say Goodbye to Aaron Swartz 1558
Aaron Swartz a través de 5 grandes contribuciones a la red 1495
Aaron Swartz, American hero 1397
Internet Activist Aaron Swartz Commits Suicide 1330
Anonymous hacks MIT after Aaron Swartz’s suicide | Internet & Media – CNET News 1327
danah boyd | apophenia » processing the loss of Aaron Swartz 1280
Official Statement from the family and partner of Aaron Swartz – Remember Aaron Swartz 1199
depression lies | WIL WHEATON dot NET: 2.0 1164
BBC News – Aaron Swartz, internet freedom activist, dies aged 26 1143
In the Wake of Aaron Swartz’s Death, Let’s Fix Draconian Computer Crime Law – EFF 1088
Westboro Baptist Church Drops Aaron Swartz Funeral Protest After Anonymous Vows Action (VIDEO) 1079
Soup • Official Statement from the Family and Partner of… 1067
‘Aaron was killed by the government’ – Robert Swartz on his son’s death — RT 1066
#PDFTribute list of documents 1044
Internet prodigy, activist Aaron Swartz commits suicide – CNN.com 1009
Remembering Aaron Swartz | The Nation 1003
If I get hit by a truck… 991
Suicide d’Aaron Swartz, activiste à l’origine du format RSS et de Creative Commons 938
Hacker, Activist Aaron Swartz Commits Suicide | ZDNet 896
Activism “How We Stopped SOPA” by Aaron Swartz (1986-2013) 896
Muere a los 26 años el ciberactivista Aaron Swartz | Tecnología | EL PAÍS 887
10 Awful Crimes That Get You Less Prison Time Than What Aaron Swartz Faced | Alternet 868
Aaron Swartz, Coder and Activist, Dead at 26 | Threat Level | Wired.com 856
How the Legal System Failed Aaron Swartz–and Us : The New Yorker 849
https://aaronsw.jottit.com/howtoget 811
How Anonymous Got Westboro to Back Off Aaron Swartz’s Funeral – National – The Atlantic Wire 804
Muerte de Aaron Swartz: la necesidad del Open Data en el I+D 779
US court drops charges on Aaron Swartz days after his suicide — RT 772
Researchers begin posting article PDFs to twitter in #pdftribute to Aaron Swartz « Neuroconscience 745
My Aaron Swartz, whom I loved. | Quinn Said 742
The inspiring heroism of Aaron Swartz | Glenn Greenwald | Comment is free | guardian.co.uk 713
Government formally drops charges against Aaron Swartz | Ars Technica 708
Aaron Swartz’s Politics « naked capitalism 704
CNN.com – Breaking News, U.S., World, Weather, Entertainment & Video News 690
After Aaron Swartz: The Tech World Must Talk About Depression 670
JSTOR liberator 663
Internet Activist Aaron Swartz Commits Suicide 661
Anonymous tumba las webs del MIT y DOJ como tributo a Aaron Swartz 652
Anonymous Hacks MIT, Leaves Farewell Message for Aaron Swartz 647

There were 209,839 Twitter users that mentioned Aaron on Twitter in the last week. I was one of them. I wish I could’ve done more to help.

Inside Out Libraries

Peter Brantley tells a sad tale about where public library leadership is at, as we plunge headlong into the ebook future, that has been talked about for what seems like forever, and which is now upon us. It’s not pretty.

The general consensus among participants was that public libraries have two, maybe three years to establish their relevance in the digital realm, or risk fading from the central place they have long occupied in the world’s literary culture.

The fact that a bunch of big-wigs invited by IFLA were seemingly unable to find inspiration and reason to hope that public libraries will continue to exist is not surprising in the least I guess. I’m not sure that libraries were ever the center of the world’s literary culture. But for the sake of argument lets assume they were, and that now they’re increasingly not. Let us also assume that the economic landscape around ebooks is in incredible turmoil, and that there will continue to be sea changes in technologies, and people’s use of them in this area for the foreseeable future.

What can libraries do to stay relevant? I think part of the answer is: stop being libraries…well, sorta.

The HyperLocal

The most serious threat facing libraries does not come from publishers, we argued, but from e-book and digital media retailers like Amazon, Apple, and Google. While some IFLA staff protested that libraries are not in the business of competing with such companies, the library representatives stressed that they are. If public libraries can’t be better than Google or Amazon at something, then libraries will lose their relevance.

In my mind the thing that libraries have to offer, which these big corporations cannot, is authentic, local context for information about a community’s past, present and future. But in the past century or so libraries have focused on collecting mass produced objects, and sharing data about said objects. The mission of collecting hyper-local information has typically been a side task, that has fallen to special collections and archives. If I were invited to that IFLA meeting I would’ve said that libraries need to shift their orientation to caring more about the practices of archives and manuscript collections, by collecting unique, valued, at risk local materials, and adapting collection development and descriptive practices to the realities of more and more of this information being available as data.

As Mark Matienzo indicated (somewhat indirectly in Twitter) after I published this blog post, a lot of this work involves focusing less on hoarding items like books, and focusing more on the functions, services, and actions that public libraries want to document and engage with in their communities. Traditionally this orientation has been a strength area for archivists in their practice and theory of appraisal where:

… considerations … include how to meet the record-granting body’s organizational needs, how to uphold requirements of organizational accountability (be they legal, institutional, or determined by archival ethics), and how to meet the expectations of the record-using community. Wikipedia

I think this represents a pretty significant cognitive shift for library professionals, and would in fact take some doing. But perhaps that’s just because my exposure to archival theory in “library school” was pretty pathetic. Be that as it may here are some practical examples of growth areas for public libraries that I wish came up at the IFLA meeting.

Web Archiving

The Internet Archive and national libraries that are part of the International Internet Preservation Consortium don’t have the time, resources and often mandate to collect web content that are of interest at the local level. What if the tooling and expertise existed for public libraries to perform some of this work, and to have the results fed into larger aggregations of web archives?

Municipality Reports and Data

Increasing amounts of data are being collected as part of the daily working of our local governments. What if your public library had the resources to be a repository for this data? Yeah, I said the R word. But I’m not suggesting that public libraries get the expertise to set up Fedora instances with Hydra heads, or something. I’m thinking about approaches to allowing data to easily flow into an organization, where it is backed up, and made available in a clearinghouse manner similar to public.resource.org on the Web, for search engines to pick up. Perhaps even services like LibraryBox offer another lens to look at the opportunities that lie in this area.

Born Digital Manuscript Collections

Public libraries should be aggressively collecting the “papers” of local people who have had significant contributions to their communities. Increasingly, these aren’t paper at all, but are born digital content. For example: email correspondence, document archives, digital photograph collections. I think that librarians and archivists know, in theory, that this born digital content is out there, but the reality is it’s not flowing into the public library/archive. How can we change this? Efforts such as Personal Digital Archiving are important for two reasons: they help set up the right conditions for born digital collections to be donated, and they also make professionals think about how they would like to receive materials so that they are easier to process. Think more things like AIMS, training and tooling for both professionals and citizens.

Licensing

It’s not unusual for archives and special collections to have all sorts of donor gift agreements that place restrictions on how their donated materials can be used. To some extent needing to visit the collection, request it, and not being able to leave the room with it, has mitigated some of this special-snowflakism. But when things are online things change a bit. We need to normalize these agreements so that content can flow online, and be used online in clearer ways. What if we got donors to think about Creative Commons licenses when they donated materials? How can we make sure donated material can become a usable part of the Web

Persistence

We all know that things come and go on the Web. But it doesn’t need to be that way for everything on the Web. Libraries and archives have an opportunity to show how focusing on being a clearninghouse for data assets can allow for things to live persistently on the Web. Thinking about our URLs as identifiers for things we are taking care of is important. Practical strategies for achieving that are possible, and repeatable. What if public libraries were safe harbors for local content on the World Wide Web? This might sound hard to do, but I think it’s not as hard as people think.

Metrics

As libraries/archives make more local content available publicly on the Web it becomes important to track how this content is accessed and used online. Quick wins like Web analytics tools (Google Analytics) for seeing what is being accessed and from where. Seeing how content is cited in social media applications like Facebook, Twitter, Pinterest and Wikipedia is important for reporting on the value of online collections. But encouraging professionals to use this information to become part of the conversations is equally important. Good metrics are also essential for collection development purposes, seeing what content is of interest, and what is not.

Inside Out Libraries

So, no I don’t think public libraries need a new open source Overdrive. The ebook market will likely continue to take care of itself. I also am not really convinced we need some overarching organization like the Digital Public Library of America to serve as a single point of failure when the funding runs dry. We need distributed strategies for documenting our local communities, so that this information can take its rightful place on the Web, and be picked up by Google so that people can find it when they are on the other side of the world. Things will definitely keep changing, but I think libraries and archives need to invest in the Web as an enduring delivery platform for information.

I’ve never been before but I was so excited to read the call for the European Library Automation Group (ELAG) this year.

The theme of this year’s conference is ‘The INSIDE-OUT Library’. This theme was chosen at last year’s conference, because we concluded:

  • Libraries have been focusing on bringing the world to their users. Now information is globally available.
  • Libraries have been producing metadata for the same publications in parallel. Now they are faced with deduplicating redundancy.
  • Libraries have been selecting things for their users. Now the users select things themselves.
  • Libraries have been supporting users by indexing things locally. Now everything is being indexed in global, shared indexes.

Instead of being an OUTSIDE-IN library, libraries should try and stay relevant by shifting their paradigm 180 degrees. Instead of only helping users to find what is available globally, they should also focus on making local collections and production available to the world. Instead of doing the same thing everywhere, libraries should focus on making unique information accessible. Instead of focusing on information trapped in publications, libraries should try and give the world new views on knowledge.

This blog post is really just a somewhat shabby rephrasing of that call. Maybe IFLA could use some of the folks on the ELAG program commmittee at their next meeting about the future of public libraries? Hopefully 2013 will be a year I can make it to ELAG.

I expect public libraries will continue to exist, but there isn’t going to be some magical technical solution to their problems. Their future will be forged by each local relationship they make, which leads to them better documenting their place on the Web. We may not call these places public libraries at first, but that’s what they will be.

linkrot: use your illusion

Mike Giarlo wrote a bit last week about the issues of citing datasets on the Web with Digital Object Identifiers (DOI). It’s a really nice, concise characterization of why libraries and publishers have promoted and used the DOI, and indirect identifiers more generally. Mike defines indirect identifiers as

… identifiers that point at and resolve to other identifiers.

I might be reading between the lines a bit, but I think Mike is specifically talking about any identifier that has some documented or ad-hoc mechanism for turning it into a Web identifier, or URL. A quick look at the Wikipedia identifier category yields lots of these, many of which (but not all) can be expressed as a URI.

The reason why I liked Mike’s post so much is that he was able to neatly summarize the psychology that drives the use of indirect identifier technologies:

… cultural heritage organizations and publishers have done a pretty poor job of persisting their identifiers so far, partly because they didn’t grok the commitment they were undertaking, or because they weren’t deliberate about crafting sustainable URIs from the outset, or because they selected software with brittle URIs, or because they fell flat on some area of sustainability planning (financial, technical, or otherwise), and so because you can’t trust these organizations or their software with your identifiers, you should use this other infrastructure for minting and managing quote persistent unquote identifiers

Mike goes on to get to the heart of the problem, which is that indirect identifier technologies don’t solve the problem of broken links on the Web, they just push it elsewhere. The real problem of maintaining the indirect identifier when the actual URL changes becomes someone else’s problem. Out of sight, out of mind … except it’s not really out of sight right? Unless you don’t really care about the content you are putting online.

We all know that linkrot on the Web is a real thing. I would be putting my head in the sand if I were to say it wasn’t. But I would also be putting my head in the sand if I said that things don’t go missing from our brick and mortar libraries. But still, we should be able to do better than 1/2 the URLs in arXiv going dead right? I make a living as a web developer, I’m an occasional advocate for linked data, and I’m a big fan of the work Henry Thompson and David Orchard did for the W3C analyzing the use of alternate identifier schemes on the Web…so, admittedly, I’m a bit of a zealot when it comes to promoting URLs as identifiers, and taking the Web seriously as an information space.

Mike’s post actually kicked off what I thought was a useful Twitter conversation (yes they can happen), which left me contemplating the future of libraries and archives on (or in) the Web. Specifically, it got me thinking that perhaps libraries and archives of the not too distant future will be places that take special care in how they put content on the Web, so that it can be accessed over time, just like a traditional physical library or archive. The places where links and the content they reference are less likely to go dead will be the new libraries and archives. These may not be the same institutions we call libraries today. Just like today’s libraries, these new libraries may not necessarily be free to access. You may need to be part of some community to access them, or to pay some sort of subscription fee. But some of them, and I hope most, will be public assets.

So how to make this happen? What will it look like? Rather than advocating a particular identifier technology I think these new libraries need to think seriously about providing Terms of Service documents for their content services. I think these library ToS documents will do a few things.

  • They will require the library to think seriously about the service they are providing. This will involve meetings, more meetings, power lunches, and likely lawyers. The outcome will be an organizational understanding of what the library is putting on the Web, and the commitment they are entering into with their users. It won’t simply be a matter of a web development team deciding to put up some new website…or take one down. This will likely be hard, but I think it’s getting easier all the time, as the importance of the Web as a publishing platform becomes more and more accepted, even in conservative organizations like libraries and archives.
  • The ToS will address the institutions commitment for continued access to the content. This will involve a clear understanding of the URL namespaces that the library manages, and a statement about how they will be maintained over time. The Web has built in mechanisms for content moving from place to place (HTTP 301), and for when resources are removed (HTTP 410), so URLs don’t need to be written in stone. But the library needs to commit to how resources will redirect permanently to new locations, and for how long–and how they will be removed.
  • The ToS will explicitly state the licensing associated with the content, preferably with Creative Commons licenses (hey I’m daydreaming here) so that it can be confidently used.
  • Libraries and archives will develop a shared palette of ToS documents. Each institution won’t have it’s own special snowflake ToS that nobody reads. There will be some normative patterns for different types of libraries. They will be shared across consortia, and among peer institutions. Maybe they will be incorporated into, or reflect shared principles found in documents like ALA’s Library Bill of Rights or SAA’s Code of Ethics.

I guess some of this might be a bit reminiscent of the work that has gone into what makes a trusted repository. But I think a Terms of Service between a library/archive and its researcher is something a bit different. It’s more outward looking, less interested in certification and compliance and more interested in entering into and upholding a contract with the user of a collection.

As I was writing this post, Dan Brickley tweeted about a recent talk Tony Ageh (head of the archive development team at the BBC) gave at the recent Economies of the Commons conference. He spoke about his ideas for a future Digital Public Space, and the role that archives and organizations like the BBC play in helping create it.

Things no longer ‘need’ to disappear after a certain period of time. Material that once would have flourished only briefly before languishing under lock and key or even being thrown away — can now be made available forever. And our Licence Fee Payers increasingly expect this to be the way of things. We will soon need to have a very, *very* good reason for why anything at all disappears from view or is not permanently accessible in some way or other.

That is why the Digital Public Space has placed the continuing and permanent availability of all publicly-funded media, and its associated information, as the default and founding principle.

I think Tony and Mike are right. Cultural heritage organizations need to think more seriously, and more long term about the content they are putting on the Web. They need to put this thought into clear, and succinct contracts with their users. The organizations that do will be what we call libraries and archives tomorrow. I guess I need to start by getting my own house in order eh?