Category Archives: preservation

Where Brooklyn At?

As a follow up to my last post I added a script to my fork of Aaron’s py-flarchive that will load up a Redis instance with comments, notes, tags and sets for Flickr images that were uploaded by Brooklyn Museum. The script assumes you’ve got a snapshot of the archived metadata, which I downloaded as a tarball. It took several hours to unpack the tarball on a medium ec2 instance; so if you want to play around and just want the redis database let me know and I’ll get it to you.

Once I loaded up Redis I was able to generate some high level stats:

  • images: 5,697
  • authors: 4,617
  • tags: 6,132
  • machine tags: 933
  • comments: 7,353
  • notes: 963
  • sets: 141

Given how many images there were there it represents an astonishing number of authors: unique people who added tags, comments or notes. If you are curious I generated a list of the tags and saved them as a Google Doc. The machine tags were particularly interesting to me. The majority (849) of them look like Brooklyn Museum IDs of some kind, for example:

bm:unique=S10_08_Thebes/9928

But there were also 51 geotags, and what looks like 23 links to items in Pleiades, for example:

tag:pleiades:depicts=721417202

If I had to guess I’d say this particular machine tag indicated that the Brooklyn Museum image depicted Abu Simbel. Now there weren’t tons of these machine tags but it’s important to remember that other people use Flickr as a scratch space for annotating images this way.

If you aren’t familiar with them, Flickr notes are annotations of an image, where the user has attached a textual note to a region in the image. Just eyeballing the list, it appears that there is quite a bit of diversity in them, ranging from the whimsical:

  • cool! they look soo surreal
  • teehee somebody wrote some graffiti in greek
  • Lol are these painted?
  • Steaks are ready!

to the seemingly useful:

  • Hunter’s Island
  • Ramesses III Temple
  • Lapland Village
  • Lake Michigan
  • Montuemhat Crypt
  • Napoleon’s troops are often accused of destroying the nose, but they are not the culprits. The nose was already gone during the 18th century.

Similarly the general comments run the gamut from:

  • very nostalgic…
  • always wanted to visit Egypt

to:

  • Just a few points. This is not ‘East Jordan’ it is in the Hauran region of southern Syria. Second it is not Qarawat (I guess you meant Qanawat) but Suweida. Third there is no mention that the house is enveloped by the colonnade of a Roman peripteral temple.
  • The fire that destroyed the buildings was almost certainly arson. it occurred at the height of the Pullman strike and at the time, rightly or wrongly, the strikers were blamed.
  • You can see in the background, the TROCADERO with two towers .. This “medieval city” was built on the right bank where are now buildings in modern art style erected for the exposition of 1937.

Brooklyn Museum pulled over 48 tags from Flickr before they deleted the account. That’s just 0.7% of the tags that were there. None of the comments or notes were moved over.

In the data that Aaron archived there was one indicator of user engagement: the datetime included with comments. Combined with the upload time for the images it was possible to create a spreadsheet that correlates the number of comments with the number of uploads per month:

Brooklyn Museum Flickr Activity

I’m guessing the drop off in December of 2013 is due to that being the last time Aaron archived Brooklyn Museum’s metadata. You can see that there was a decline in user engagement: the peak in late 2008 / early 2009 was never matched again. I was half expecting to see that user engagement fell off when Brooklyn Museum’s interest in the platform (uploads) fell off. But you can see that they continued to push content to Flickr, without seeing much of a reward, at least in the shape of comments. It’s impossible now to tell if tagging, notes or sets trended differently.

Since Flickr includes the number of times each image was viewed it’s possible to look at all the images and see how many times images were viewed, the answer?

9,193,331

Not a bad run for 5,697 images. I don’t know if Brooklyn Museum downloaded their metadata prior to removing their account. But luckily Aaron did.

Glass Houses

You may have noticed Brooklyn Museum’s recent announcement that they have pulled out of Flickr Commons. Apparently they’ve seen a “steady decline in engagement level” on Flickr, and decided to remove their content from that platform, so they can focus on their own website as well as Wikimedia Commons.

Brooklyn Museum announced three years ago that they would be cross-posting their content to Internet Archive and Wikimedia Commons. Perhaps I’m not seeing their current bot, but they appear to have two, neither of which have done an upload since March of 2011, based on their user activity. It’s kind of ironic that content like this was uploaded to Wikimedia Commons by Flickr Uploader Bot and not by one of their own bots.

The announcement stirred up a fair bit of discussion about how an institution devoted to the preservation and curation of cultural heritage material could delete all the curation that has happened at Flickr. The theory being that all the comments, tagging and annotation that has happened on Flickr has not been migrated to Wikimedia Commons. I’m not even sure if there’s a place where this structured data could live at Wikimedia Commons. Perhaps some sort of template could be created, or it could live in Wikidata?

Fortunately, Aaron Straup-Cope has a backup copy of Flickr Commons metadata, which includes a snapshot of the Brooklyn Museum’s content. He’s been harvesting this metadata out of concern for Flickr’s future, but surprise, surprise — it was an organization devoted to preservation of cultural heritage material that removed it. It would be interesting to see how many comments there were. I’m currently unpacking a tarball of Aaron’s metadata on an ec2 instance just to see if it’s easy to summarize.

But:

I’m pretty sure I’m living in one of those.

I agree with Ben:

It would help if we had a bit more method to the madness of our own Web presence. Too often the Web is treated as a marketing platform instead of our culture’s predominant content delivery mechanism. Brooklyn Museum deserves a lot of credit for talking about this issue openly. Most organizations just sweep it under the carpet and hope nobody notices.

What do you think? Is it acceptable that Brooklyn Museum discarded the user contributions that happened on Flickr, and that all the people who happened to be pointing at said content from elsewhere now have broken links? Could Brooklyn Museum instead decided to leave the content there, with a banner of some kind indicating that it is no longer actively maintained? Don’t lots of copies keep stuff safe?

Or perhaps having too many copies detracts from the perceived value of the currently endorsed places of finding the content? Curators have too many places to look, which aren’t synchronized, which add confusion and duplication. Maybe it’s better to have one place where people can focus their attention?

Perhaps these two positions aren’t at odds, and what’s actually at issue is a framework for thinking about how to migrate Web content between platforms. And different expectations about content that is self hosted, and content that is hosted elsewhere?

To Mr. Hazard

Lots of Copies Keeps Stuff Safe, 1791-Style.

Time and accident are committing daily havoc on the originals deposited in our public offices. The late war has done the work of centuries in this business. The last cannot be recovered, but let us save what remains; not by vaults and locks, which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies as shall place them beyond the reach of accident.

– Thomas Jefferson

GoogleBooks