Technical writing is something I’ve gotten more enjoyment out of the longer I’ve been working professionally as a software developer. Although, maybe it has been an interest all along, and I’ve only recently begun to recognize it. I think Robin Sloan put it well in his Specifying Spring ’83 that writing specifications provides its own set of challenges and joys:

I recommend this kind of project, this flavor of puzzle, to anyone who feels tangled up by the present state of the internet. Protocol design is a form of investigation and critique. Even if what I describe below goes nowhere, I’ll be very glad to have done this thinking and writing. I found it challenging and energizing.

Over the past couple of years I’ve had the opportunity to work with the Webrecorder project to help document some of the data formats they are developing to encourage interoperability between web archiving tools. This hasn’t been, as Robin described, using the specification as a canvas for imagining new sociotechnical ways of being, so much as it has been helping shape existing code and documentation into a form where it’s (hopefully) easier to digest by others.

Nevertheless, this has been rewarding work, because Webrecorder have been doing so much to advance web archiving practice. When you think about it, it’s hard not to see web archives as increasingly important, as the WWW continues to be such a central technology for global publishing and information sharing, despite (or in spite of) prominent examples of greedy consolidation and market failure.

Chief among the Webrecorder specifications is the Web Archive Collection Zipped or WACZ (pronounced waxy or wack-zed), which is a packaging standard for WARC (ISO 28500:2017) data that lets archived web content created with one set of tools, be readable or “playable” with another set of tools.

The hope/gamble here is that these specifications will be part of an ecosystem of tools where web archives are portable, verifiable and useful. It is early days, but I think we are already starting to see this happen a bit. For example the Harvard Innovation Lab’s recent work on Scoop is one of a set of tools for assembling evidentiary archives of web content in their Perma project. In addition the Internet Archive’s Save Page Now service also recently added the ability to export collected data as a WACZ file.

Sure, these tools might create web archives by crawling the web in different ways that are suitable to the content. Or they might make the once live web content viewable and interactive once more. But these tools might also help visualize web archives in other ways: inspecting the file formats present in the archive, crawling behaviours used, the websites and URL patterns in the archive, viewing the media files they contain as a gallery, chart how language usage changes over time, or publish them in new spaces like IPFS.

There’s no reason why we should rely on one tool, from one organization, for all of this. Once you can depend on web archive data and metadata being laid out in a particular way as files, packaged up in a zip file, and (optionally) published on the web, these sorts of tools become feasible in a way that was previous centralized web archive architectures make more difficult.

These specifications are published at specs.webrecorder.net using Github Pages which is a popular static site publishing tool that sits on top of data you have in version control at Github. Being able to commit changes to the specifications and gather issue tickets around them is really critical to this type of documentation.

After having written them initially as Markdown we decided to try using the W3C’s ReSpec JavaScript library to publish the documents. ReSpec has lots of nice features for formatting specifications in an accessible and recognizable way. One simple example of a nicety that ReSpec offers is its system for generating References. You can easily embed citations to existing specifications so that they are formatted correctly in a References section. If the spec you want to cite hasn’t been cited before you can add it to the Specref corpus.

You can write ReSpec documents as straight up HTML, or as sections of Markdown text interspersed in the HTML. We started out doing the latter, but found over time that it was easiest to be able to edit the specifications as stand-alone Markdown documents, and then generate the ReSpec HTML as needed. I helped by writing markdown-to-respec, which is a Github Action (written in Python) for automatically generating a ReSpec document from Markdown, when a commit is pushed to Github.

As an example you can see the Markdown for the IPFS Chunking specification gets transformed into this HTML when changes to the specification are pushed to Github. ReSpec does depend on some structured metadata (authors, editors, etc) which serves as a configuration for the specification. These can be included as YAML frontmatter in the Markdown file, or if you prefer as a JSON file alongside the Markdown file.

In addition markdown-to-respec can be installed and used from the command line, outside of the Github action, if you want to see what changes look like without pushing to Github, or if you just want to experiment a bit.

If you get a chance to given it a try please let me know. And happy spec writing!