2 thoughts on “cc0 and git for data

  1. I used github to simplify the management of the linked MARC codes RDF:


    since it seemed like an easier way for people to contribute data than some homegrown editor, etc. Admittedly, the format is unorthodox (they’re turtle fragments with shared prefix ‘headers’), but I think storing data in git is both new (and thus everybody’s experimenting) and subject to compromises to bridge the gaps between the SCM mental model and the native dataformats.

    That said, I haven’t gotten a single pull request or fork of the data (although I never really did much advertisement (I always meant to blog about it, but…): not sure if that’s a critique of my approach, the value of the data or my marketing.

    Still, for smaller datasets, this seems an invaluable channel for both publishing the data and providing a mechanism for contributing edits, although I definitely would question its scalability (I probably wouldn’t, for example, publish the OpenLibrary’s data this way).

