Skip to content

Author Archives: ed

edu, gov and tlds in en.wikipedia external links

Some folks over at Wikipedia Signpost asked if they could use some of the barcharts I’ve been posting here recently. They needed the graphs to be released with a free license, which was a good excuse to slap a Creative Commons Attribution 3.0 license on all the content here at inkdroid. I’m kinda ashamed I [...]

lots of copies keeps epubs safe

Over the weekend you probably saw the announcements going around about Google Books releasing +1 million public domain ebooks on the web as epubs. This is great news: epub is a web friendly, open format — and having all this content available as epub is important. Now I might be greedy, but when I saw [...]

simplicity and digital preservation, sorta

Over on the Digital Curation discussion list Erik Hetzner of the California Digital Library raised the topic of simplicity as it relates to digital preservation, and specifically to CDL’s notion of Curation Microservices. He referenced a recent bit of writing by Martin Odersky (the creator of Scala) with the title Simple or Complicated. In one [...]

bad xml smells

I’m used to refactoring code smells, but sometimes you can catch a bad whiff in XML too. Before: < ?xml version="1.0" encoding="UTF-8"?> <mets TYPE="urn:library-of-congress:ndnp:mets:encyclopedia:encyclopediaEntry" PROFILE="urn:library-of-congress:mets:profiles:ndnp:encyclopediaEntry:v1.1" LABEL="The National Forum Scope Note" xmlns:mods="http://www.loc.gov/mods/v3" xmlns="http://www.loc.gov/METS/" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">   <!–METS HEADER–> <metshdr CREATEDATE="2007-01-10T09:00:00" ><!–CREATEDATE should be populated with creation date of the record. RECORDSTATUS should only be set [...]

top hosts referenced in wikipedia (part 2)

Jodi Schneider pointed out to me in an email that my previous post about the top 100 hosts referenced in wikipedia may have been slightly off balance since it counted *all* pages on wikipedia (talk pages, files, etc), and was not limited to only links in articles. The indicator for her was the high ranking [...]

notes on retooling libraries

If you work in the digital preservation field and haven’t seen Dorothea Salo’s Retooling Libraries for the Data Challenge in the latest issue of Ariadne definitely give it a read. Dorothea takes an unflinching look at the at the scope and characteristics of data assets currently being generated by scholarly research, and how equipped traditional [...]

top hosts referenced in english wikipedia

I’ve recently been experimenting a bit to provide some tools to allow libraries, archives and museums to see how Wikipedians are using their content as primary source material. I didn’t actually anticipate the interest in having a specialized tool like linkypedia to monitor who is using your institutions content on Wikipedia. So, the demo site [...]

version control and digital curation

For some time now I have been meaning to write about some of the issues related to version control in repositories as they relate to some projects going on at $work. Most repository systems have a requirement to maintain original data as submitted. But as we all know this content often changes over time–sometimes immediately. [...]

federal register embraces the web and opensource

Tom Lee of the Sunlight Foundation blogged yesterday about the new Federal Register website. The facelift was also announced a few days earlier by the Archivist of the United States, David Ferriero. If you aren’t familiar with it already, the Federal Register is basically the daily newspaper of the United States Federal Government, which details [...]