The Internet Archive gave the Wayback Machine a facelift back in January. It actually looks really nice, but I noticed something kinda odd. I was looking for old archived versions of the lcsh.info site. Things work fine for the latest archived copies:
But during part of lcsh.info’s brief lifetime the site was serving up XHTML with the application/xhtml+xml media type. Now Wayback rightly (I think) remembers the media type, and serves it up that way:
ed@curry:~$ curl -I http://replay.waybackmachine.org/20081216020433/http://lcsh.info/ HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Guessed-Charset: UTF-8 X-Archive-Orig-Connection: close X-Archive-Orig-Content-Length: 6497 X-Archive-Orig-Content-Type: application/xhtml+xml; charset=UTF-8 X-Archive-Orig-Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.4.6 PHP/5.2.4-2ubuntu5.4 with Suhosin-Patch mod_wsgi/1.3 Python/2.5.2 X-Archive-Orig-Date: Tue, 16 Dec 2008 02:04:31 GMT Content-Type: application/xhtml+xml;charset=utf-8 X-Varnish: 1458812435 1458503935 Via: 1.1 varnish Date: Wed, 09 Mar 2011 23:09:47 GMT X-Varnish: 903390921 Age: 0 Via: 1.1 varnish Connection: keep-alive X-Cache: MISS
But to add navigation controls and branding, Wayback also splices in its own HTML into the display, which unfortunately is not valid XML. And since the media type and doctype trigger standards mode in browsers, the pages render in Firefox like this:
And in Chrome like this:
Now I don’t quite know what the solution should be here. Perhaps the HTML that is spliced in should be valid XML. Or maybe Wayback should just serve up the HTML as text/html. Or maybe this is a good use case for frames (gasp). But I imagine it will similarly afflict any other XHTML that was served up as application/xhtml+xml when Heretrix crawled it.
Sigh. I sure am glad that HTML5 is arriving on the scene and XHTML is riding off into the sunset. Although it’s kind of the Long Goodbye given Internet Archive has archived it.
Update: just a couple hours later I got an email that a fix for this was deployed. And sure enough now it works. I quickly eyeballed the response and didn’t see what the change was. Thanks very much Internet Archive!