t.co

People are discovering that Twitter’s archive download includes shortened t.co URLs instead of the original URLs that you tweeted. If (when) Twitter goes away, the server at t.co won’t be available to respond to requests. This means your archived tweets will potentially lack some pretty significant context. Even if they don’t work (404) these original URLs are important, because you can at least try to look them up in a web archive like the Internet Archive.

It’s kind of strange that Twitter’s archive download doesn’t display the original URL because some (not all) of the t.co URLs, and their “expanded” form are present in the archived data. The client side app just doesn’t use them in the display. On the other hand maybe it isn’t so strange, since Twitter is in the surveillance capitalism business, and clicks are what corporate social media companies crave most.

Tim Hutton has written a handy tool that will rewrite your archive as a set of Markdown files, and which also uses the longer URLs that are available. I think this is super useful, but I actually really quite like the little client-side application that Twitter provided. So I wanted something that would rewrite the t.co URLs to show the original URL instead.

twitter-archive-unshorten is a small Python program that will examine all the JavaScript files in the archive download and rewrite the t.co short URLs to their original full URL form.

If you look closely one thing you may notice in the screenshots above is that other short URLs (e.g. bit.ly) are not unshortened. I didn’t want this program to completely unravel the short URLs since all kinds of things can go wrong when trying to resolve these. I figure if I included a bit.ly URL in my tweet it seems like that’s what the archive should show?

If you want to try it out make sure you have Python3 installed then:

Make a backup of your original Twitter archive!
pip3 install twitter-archive-unshorten
Unzip your Twitter archive zip file
twitter-archive-unshorten /path/to/your/archive/directory/

It might take a while, depending on how many tweets with URLs you have. I had 20,000 or so short URLs so it took a 2-3 hours. Once it’s finished you should be able open your Archive and interact with it without seeing the t.co URLs. The mapping of short URLs to long URLs that was discovered is saved in your archive directory as data/shorturls.json in case you need it. There also should be a twitter-archive-unshorten.log file in the root of the archive, with a record of what was done.

Random aside. One interesting thing I discovered when creating this program is that there are some very short t.co URLs, for example: https://t.co/L. I also learned from Hank that somehow https://t.co/elon was (recently?) created.