Makian As It Appears from the Side of Ngofakiaha

While I worked at the Library of Congress more than a decade ago I had a small hand in an initial prototype website called the World Digital Library. There were lots of hands and minds involved in the project over time, but the technical lead when I was working on it was Dan Chudnov…Dan really just asked me to help out with some Solr things here and there. The technical work was later picked up by Chris Adams who really took WDL to another level. WDL provided a rich, high resolution user experience for the time as well as multi-lingual access to a set curated cultural heritage objects from around the world.

I’m using the past tense here because I’ve just noticed that the project ended its 10 years of development in 2021, and the items have now been migrated into the Library of Congress general web collection. Here’s a nice article by Julie Stoner that explores some of the amazing maps that WDL put on the web.

I’m really only mentioning it here because of the extra effort that the project seem to have taken late last year to migrate and preserve the content prior to turning off the web application running at The old URLs were mapped to new ones at and made available by the LC’s primary web publishing platform. Quite a bit of data and metadata needed to move around behind the scenes for this to happen.

So old URLs that looked like:

now redirect permanently to URLs like:

Because WDL’s item ids were a straightforward integer sequence (with a little bit of code) you can see that it wasn’t a simply a mechanical redirect, but involved an understanding of the object’s type and where it lives in the LoC website:

import requests

for i in range(1, 100):
    url = f'{i}/'
    resp = requests.get(url)
    print(f'{url} -> {resp.url}') -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> ->

Although some of those Handle URLs appear to do nothing? Maybe they are still in the process of being registered? Perhaps the intention is for them to eventually point at institutions other than the Library of Congress, e.g. where the item originally came from?

These redirects are a good thing because there are lots of links from other places on the web to WDL at its old location, such as Wikipedia. Sure Wikipedia and Internet Archive will routinely look for dead links and rewrite them to point at the Wayback Machine. But there are more places on the web that continue to point to More importantly, if the resource is continuing to be cared for on the web I think it’s better for the redirect to point to that location rather than some stale version in a “web archive” on another organization’s website?

Just as an aside, I wonder if Wikipedia crawl their links looking for permanent redirects to update to? If there were a practice for this it would probably require some level of human curation, because permanent redirects can also suffer from reference rot, where the link now points at a page that has little to do with the original (Zittrain, Albert, & Lessig, 2014).

One interesting thing that LC also chose to do was create a web archive of what the WDL website looked like before it was shut down. I ran into the occasional CloudFlare error when trying to look at some pages, and the ones that worked were a bit sluggish, but I think I’ve come to expect this of web archives. It appears to be running a version of OpenWayback?

This web archive preserves something that is lost in the presentation in the LC website: the multilingual interface. As of this writing you can see in the search results at DuckDuckGo, but I imagine these will start to disappear (by design) since these are permanent redirects which signal the search platform that these old URLs can be purged from their index.

For some time I’ve had a small research project in the back of my mind to do a field study (or two) with organizations that are planning or executing a content migration, from one CMS to the other. I think it would be interesting to examine the types of conversations, planning and tradeoffs that go on. If you are planning a migration like this, and wouldn’t mind having a weird social scientist / software developer interloper hanging around please let me know.

Finally this WDL migration also reminded me of an actual research project that I’m actually in the middle of: helping with some technical writing on the WACZ specification for the Webrecorder project.

Part of the hope of WACZ and tools like is that it can be easier to mount web archives directly on the web. So instead of crawling with Heretrix and ingesting that collected WARC data into OpenWayback, you could package up the collected WARC data into a WACZ, or perhaps a set of language specific WACZ files, and simply place the WACZ file in a web accessible location (e.g. an S3 bucket) with an HTML file of your choosing that uses the Web Component (some JavaScript).

Or perhaps you instead (or in addition) take those WACZ files and give them to the organizations that donated the materials in the first place, for them to place on the web? Here are some examples of what that can look like that Webrecorder put together for Stanford University Press. Making web archives more portable, while retaining their integrity and accessibility, is important work that the Webrecorder project is engaged with–largely through the efforts of Ilya Kreymer who deserves a lot of credit for advancing and innovating web archiving practice.


Zittrain, J., Albert, K., & Lessig, L. (2014). Perma: Scoping and addressing the problem of link and reference rot in legal citations. Legal Information Management, 14(02), 88–99.