simplicity and digital preservation, sorta

Over on the Digital Curation discussion list Erik Hetzner of the California Digital Library raised the topic of simplicity as it relates to digital preservation, and specifically to CDL’s notion of Curation Microservices. He referenced a recent bit of writing by Martin Odersky (the creator of Scala) with the title Simple or Complicated. In one of the responses Brian Tingle (also of CDL) suggested that simplicity for an end user and simplicity for the programmer are often inversely related. My friend Kevin Clarke prodded me in #code4lib into making my response to the discussion list into a blog post so, here it is (slightly edited).

For me, the Odersky piece is a really nice essay on why simplicity is often in the eye of the beholder. Often the key to simplicity is working with people who see things in roughly the same way. People who have similar needs, that are met by using particular approaches and tools. Basically a shared and healthy culture to make emergent complexity palatable.

Brian made the point about simplicity for programmers having an inversely proportional relationship to simplicity for end users, or in his own words:

I think that the simpler we make it for the programmers, usually the more complicated it becomes for the end users, and visa versa.

I think the only thing to keep in mind is that the distinction between programmers and end users isn’t always clear.

As a software developer I’m constantly using, or inheriting someone else’s code: be it a third party library that I have a dependency on, or a piece of software that somebody wrote once upon a time, who has moved on elsewhere. In both these cases I’m effectively an end-user of a program that somebody else designed and implemented. The interfaces and abstractions that this software developer has chosen are the things I (as an end user) need to be able to understand and work with. Ultimately, I think that it’s easier to keep software usable for end users (of whatever flavor) by keeping the software design itself simple.

Simplicity makes the software easier to refactor over time when the inevitable happens, and someone wants some new or altered behavior. Simplicity also should make it clear when a suggested change to a piece of software doesn’t fit the design of the software in question, and is best done elsewhere. One of the best rules of thumb I’ve encountered over the years to help get to this place is the Unix Philosophy:

Write programs that do one thing and do it well. Write programs to work together.

As has been noted elsewhere, composability is one of the guiding principles of the Microservices approach–and it’s why I’m a big fan (in principle). Another aspect to the Unix philosophy that Microservices seems to embody is:

Data dominates.

The software can (and will) come and go, but we are left with the data. That’s the reality of digital preservation. It could be argued that the programs themselves are data, which gets us into sci-fi virtualization scenarios. Maybe someday, but I personally don’t think we’re there yet.

Another approach I’ve found that works well to help ensure code simplicity has been unit testing. Admittedly it’s a bit of a religion, but at the end of the day, writing tests for your code encourages you to use the APIs, interfaces and abstractions that you are creating. So you notice sooner when things don’t make sense. And of course, they let you refactor with a safety net, when the inevitable changes rear their head.

And, another slightly more humorous way to help ensure simplicity:

Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.

Which leads me to a jedi mind trick my former colleague ~~Keyser Söze~~ Andy Boyko tried to teach me (I think): it’s useful to know when you don’t have to write any code at all. Sometimes existing code can be used in a new context. And sometimes the perceived problem can be recast, or examined from a new perspective that makes the problem go away. I’m not sure what all this has to do with digital preservation. The great thing about what CDL is doing with microservices is they are trying to focus on the what, and not the how of digital preservation. Whatever ends up happening with the implementation of Merritt itself, I think they are discovering what the useful patterns of digital preservation are, trying them out, and documenting them…and it’s incredibly important work that I don’t really see happening much elsewhere.