<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>inkdroid &#187; lc</title>
	<atom:link href="http://inkdroid.org/journal/tag/lc/feed/" rel="self" type="application/rss+xml" />
	<link>http://inkdroid.org/journal</link>
	<description>$pithy_personal_mission_statement</description>
	<lastBuildDate>Wed, 28 Jul 2010 13:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>q &amp; a</title>
		<link>http://inkdroid.org/journal/2009/01/07/q-a/</link>
		<comments>http://inkdroid.org/journal/2009/01/07/q-a/#comments</comments>
		<pubDate>Wed, 07 Jan 2009 21:12:57 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[computers]]></category>
		<category><![CDATA[knitting]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[libraries]]></category>
		<category><![CDATA[love]]></category>
		<category><![CDATA[networks]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=610</guid>
		<description><![CDATA[Q: What do 100 year old knitting patterns and a lost Robert Louis-Stevenson story have in common? A: A digitally preserved newspaper page. Q: What about if you add: URIs for knitting materials William Blake&#8216;s Engravings The similarities/differences between XMPP, HTTP and NNTP Web crawling as data integration Project coordination with rooms on FriendFeed brewing [...]]]></description>
			<content:encoded><![CDATA[<p><em><strong>Q:</strong> What do 100 year old knitting patterns and a lost Robert Louis-Stevenson story have in common?</em></p>
<p><strong>A:</strong> A digitally preserved <a href="http://www.loc.gov/chroniclingamerica/lccn/sn83030193/1904-02-19/ed-1/seq-15">newspaper page</a>.</p>
<p><em><strong>Q:</strong> What about if you add:</em></p>
<ul>
<li>URIs for<a href="https://www.ravelry.com"> knitting materials</a></li>
<li><a href="http://en.wikipedia.org/wiki/William_Blake">William Blake</a>&#8216;s Engravings</li>
<li>The similarities/differences between <a href="http://en.wikipedia.org/wiki/Xmpp">XMPP</a>, <a href="http://en.wikipedia.org/wiki/Http">HTTP</a> and <a href="http://en.wikipedia.org/wiki/Nntp">NNTP</a></li>
<li>Web crawling as data integration</li>
<li>Project coordination with <a href="http://friendfeed.com/rooms/semantic-web">rooms</a> on FriendFeed</li>
<li>brewing <a href="http://en.wikipedia.org/wiki/Kombucha">Kombucha</a></li>
</ul>
<p><strong>A: </strong>Just a typical lunch time conversation at <a href="http://www.yelp.com/biz/petes-diner-washington">Pete&#8217;s</a> with a <a href="http://davidbrunton.com">couple</a> <a href="http://eikeon.com">people</a> I work with. The cool thing (for me) is that this is normal, involves a host of smart/interesting characters, and is routinely encouraged. I love my job.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2009/01/07/q-a/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>digital-curation</title>
		<link>http://inkdroid.org/journal/2008/11/26/digital-curation/</link>
		<comments>http://inkdroid.org/journal/2008/11/26/digital-curation/#comments</comments>
		<pubDate>Wed, 26 Nov 2008 15:22:59 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[people]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[repositories]]></category>
		<category><![CDATA[bagit]]></category>
		<category><![CDATA[cdl]]></category>
		<category><![CDATA[jisc]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[ndiipp]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=564</guid>
		<description><![CDATA[Some folks at LC and CDL are trying to kick-start a new public discussion list for talking about digital curation in its many guises: repositories, tools, standards, techniques, practices, etc. The intuition being that there is a social component to the problems of digital preservation and repository interoperability. Of course NDIIPP (the arena for the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://groups.google.com/group/digital-curation"><img src="http://inkdroid.org/images/digital-curation.png" style="margin-right: 10px; float: left; border: thick groove gray;" /></a></p>
<p>Some folks at <a href="http://loc.gov">LC</a> and <a href="http://cdlib.org">CDL</a> are trying to kick-start a new <a href="http://groups.google.com/group/digital-curation">public discussion list</a> for talking about digital curation in its many guises: repositories, tools, standards, techniques, practices, etc. The intuition being that there is a social component to the problems of digital preservation and repository interoperability. </p>
<p>Of course <a href="http://digitalpreservation.gov">NDIIPP</a> (the arena for the CDL/LC collaboration) has always been about <a href="http://www.digitalpreservation.gov/library/program_back.html">building and strengthening a network of partners</a>. But as Priscilla Caplan points out in her survey of the digital preservation landscape <a href="http://dx.doi.org/10.1108/07378830710840419">Ten Years After</a>, organizations in Europe like the <a href="http://www.jisc.ac.uk/">JISC</a> and <a href="http://www.langzeitarchivierung.de/">NESTOR</a> seem to have understood that there is an educational component to digital preservation as well. Yet even the JISC and NESTOR have tended to focus more on the preservation of scholarly output, whereas digital preservation really extends beyond that realm of materials.</p>
<p>The continual need to share good ideas and hard-won-knowledge about digital curation, and to build a network of colleagues and experts that extends out past the normal project/institution specific boundaries is just as important as building the collections and the technologies themselves. </p>
<p>So I guess this is a rather highfalutin goal &#8230; here&#8217;s some text stolen from the <a href="http://groups.google.com/group/digital-curation">digital-curation</a> home page to give you more of a flavor:</p>
<blockquote><p>
The digital preservation and repositories domain is fortunate to have a diverse set of institutional and consortial efforts, software projects, and standardization initiatives.  Many discussion lists have been created for these individual efforts. The digital-curation discussion list is intended to be a public forum that encourages cross-pollination across these project and institutional boundaries in order to foster wider awareness of project- and institution-specific work and encourage further collaboration.</p>
<p>Topic of conversation can include (but is not limited to)</p>
<ul>
<li>digital repository software (Fedora, DSpace, EPrints, etc.)</li>
<li>management of digital formats (JHOVE, djatoka, etc.)</li>
<li>use and development of standards (OAIS, OAI-PMH/ORE, MPEG21, METS, BagIt, etc.)</li>
<li>issues related to identifiers, packaging, and data transfer</li>
<li>best practices and case studies around curation and preservation of digital content</li>
<li>repository interoperability</li>
<li>conference, workshop, tutorial announcements</li>
<li>recent papers</li>
<li>job announcements</li>
<li>general chit chat about problems, solutions, itches to be scratched</li>
<li>humor and fun</li>
</ul>
</blockquote>
<p>We&#8217;ll see how it goes. If you are at all interested please sign up.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/11/26/digital-curation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BagIt</title>
		<link>http://inkdroid.org/journal/2008/06/06/bagit/</link>
		<comments>http://inkdroid.org/journal/2008/06/06/bagit/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 12:57:03 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[atompub]]></category>
		<category><![CDATA[bagit]]></category>
		<category><![CDATA[cdl]]></category>
		<category><![CDATA[checksums]]></category>
		<category><![CDATA[ietf]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[sword]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=203</guid>
		<description><![CDATA[One little bit of goodness that has percolated out from my group at $work in collaboration with the California Digital Library is the BagIt spec (more readable version). BagIt is an IETF RFC for bundling up files for transfer over the network, or for shipping on physical media. Just yesterday a little article about BagIt [...]]]></description>
			<content:encoded><![CDATA[<p>One little bit of goodness that has percolated out from my group at $work in collaboration with the <a href="http://www.cdlib.org/">California Digital Library</a> is the <a href="http://tools.ietf.org/id/draft-kunze-bagit">BagIt</a> spec (<a href="http://www.cdlib.org/inside/diglib/bagit/bagitspec.html">more readable version</a>). BagIt is an IETF RFC for bundling up files for transfer over the network, or for shipping on physical media. Just yesterday a little <a href="http://www.digitalpreservation.gov/news/2008/20080602news_article_bagit.html">article</a> about BagIt surfaced on the LC digital preservation website, so I figure now is a good time to mention it.</p>
<p>The goodness of BagIt is in its simplicity and utility. A Bag is essentially: a set of files in a particular directory named <em>data</em>, a <em>manifest file</em> which states what files ought to be in the <em>data</em> directory, and a <em>bagit.txt</em> file that states the version of BagIt. For example here&#8217;s a sample (abbreviated) directory structure for a bag of digitized newspapers via the <a href="http://www.loc.gov/ndnp/">National Digital Newspaper Program</a>:</p>
<pre>
mybag
|-- bagit.txt
|-- data
|   `-- batch_lc_20070821_jamaica
|       |-- batch.xml
|       |-- batch_1.xml
|       `-- sn83030214
|           |-- 00175041217
|           |   |-- 00175041217.xml
|           |   |-- 1905010401
|           |   |   |-- 1905010401.xml
|           |   |   `-- 1905010401_1.xml
|           |   |-- 1905010601
|           |   |   |-- 1905010601.xml
|           |   |   `-- 1905010601_1.xml
</pre>
<p>The manifest itself is just the relative file path, and a fixity value:</p>
<pre>
ea9dee53c2c2dd4027984a2b59f58d1f  data/batch_lc_20070821_jamaica/batch.xml
72134329a82f32dd44d59b509928b6cd  data/batch_lc_20070821_jamaica/batch_1.xml
dc5740d295521fcc692bb58603ce8d1a  data/batch_lc_20070821_jamaica/sn83030214/00175041217/1905010601/1905010601_1.xml
e16e74988ca927afc10ee2544728bd14  data/batch_lc_20070821_jamaica/sn83030214/00175041217/1905010601/1905010601.xml
fd480b2c4bcb6537c3bc4c9e7c8d7c21  data/batch_lc_20070821_jamaica/sn83030214/00175041217/1905010401/1905010401.xml
e0e4a981ddefb574fa1df98a8a55b7a4  data/batch_lc_20070821_jamaica/sn83030214/00175041217/1905010401/1905010401_1.xml
c8dffa3cdb7c13383151e0cd8263d082  data/batch_lc_20070821_jamaica/sn83030214/00175041217/00175041217.xml
</pre>
<p>The manifest format happens to be the same format understood and generated by the common unix (and windows) utility <a href="http://en.wikipedia.org/wiki/Md5deep">md5deep</a>. So it&#8217;s pretty easy to generate and validate the manifests.</p>
<p>The context for this work has largely been <a href="http://www.digitalpreservation.gov/">NDIIPP</a> partners (like CDL) transferring data generated by funded projects back to LC. Although it&#8217;s likely to get used in some other places as well internally. It&#8217;s funny to see the spec in its current state, after Justin Littman rattled off the LC Manifest wiki page in a few minutes after a meeting where <a href="http://boyko.net/andy">Andy Boyko</a> initially brought up the issue. Andy has just left LC to work for a <a href="http://www.apple.com/itunes/">record company in Cupertino</a>. I don&#8217;t think I fully understood simplicity in software development until I worked with Andy. He has a real talent for boiling down solutions to their most simple expression, often leveraging existing tools to the point where very little software actually needs to be written. I think Andy and John found a natural affinity for striving for simplicity, and it shows in BagIt. Andy will be sorely missed, but that record store is lucky to get him on their team.</p>
<p>There are some additional cool features to BagIt, including the ability to include a <em>fetch.txt</em> file which contains http and/or rsync URIs to fill in parts of the bag from the network. We&#8217;ve come to refer to bags with a fetch.txt as &#8220;holey bags&#8221; because they have holes in them that need to be filled in. This allows very large bags to be assembled quickly in parallel (using a 100 line python script Andy Boyko wrote, or whatever variant of wget, curl, rsync makes you happy). Also you can include a <em>package-info.txt</em> which includes some basic metadata as key/value pairs &#8230; designed primarily for humans.</p>
<p><a href="http://eikeon.com/">Dan Krech</a> and I are in the process of creating a prototype deposit web application that will essentially allow bags to be submitted via a <a href="http://www.ukoln.ac.uk/repositories/digirep/index/SWORD">SWORD</a> (profile of<a href="http://www.rfc-editor.org/rfc/rfc5023.txt"> AtomPub</a> for Repositories) service. The SWORD part should be pretty easy, but getting the retrieval of &#8220;holey bags&#8221; kicked off and monitored propertly will be the more challenging part. Hopefully I&#8217;ll be able to report more here as things develop. </p>
<p>Feedback on the BagIt RFC is most welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/06/06/bagit/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>justify my links</title>
		<link>http://inkdroid.org/journal/2008/05/29/justify-my-links/</link>
		<comments>http://inkdroid.org/journal/2008/05/29/justify-my-links/#comments</comments>
		<pubDate>Thu, 29 May 2008 12:58:07 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[nyc]]></category>
		<category><![CDATA[semanticweb]]></category>

		<guid isPermaLink="false">http://inkdroid.org/journal/?p=201</guid>
		<description><![CDATA[Thanks to a tip from Ian, I&#8217;m looking forward to (hopefully) attending the Linked Data Planet conference in New York City as a volunteer. The idea is that I just have to pay for my hotel, and the cost of admission is waived. It seems my travel money is a bit limited at the moment [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to a tip from <a href="http://iandavis.com/">Ian</a>, I&#8217;m looking forward to (hopefully) attending the <a href="http://www.linkeddataplanet.com/">Linked Data Planet</a> conference in New York City as a volunteer. The idea is that I just have to pay for my hotel, and the cost of admission is waived. It seems my travel money is a bit limited at the moment (sometimes it&#8217;s there, sometimes it isn&#8217;t), so I figured minimizing costs would be appreciated. But today I got a request to &#8220;justify&#8221; my attendance at the conference. It was actually kind of a good exercise to sit down and write why I think the conference and <a href="http://linkeddata.org/">Linked Data</a> in general is important to the <a href="http://loc.gov">Library of Congress</a>.</p>
<blockquote><p>One of the challenges of Digital Repository work is modeling the context for digital objects. The context for a digital object includes the set of relationships a particular digital object has with other objects in the repository. 30 years of relational database research and development have allowed us to do this modeling pretty effectively within the scope of a particular application.</p>
<p>Very often, particularly in institutions the size of the Library of Congress, the context for a digital object includes digital objects found elsewhere in the enterprise&#8211;in other applications, with their own databases. In addition some institutions (like LC) also need to make their digital resources available publicly for other organizations to reference. The challenge here is in making the objects found in silos or islands of application data (typically housed in databases) reference-able and resolvable, so that other applications inside and outside the enterprise can use them.</p>
<p>As a practical example, a  picture of Dizzie Gilliespie found in the America Memory collection </p>
<div style="text-align: center;">
<a href=" http://lcweb2.loc.gov/cgi-bin/query/i?ammem/van:@field(NUMBER+@band(van+5a52027)):displayType=1:m856sd=van:m856sf=5a52027 "><img src="http://memory.loc.gov/pnp/van/5a52000/5a52000/5a52027r.jpg" /></a>
</div>
<p>is related to the book:</p>
<p><em><br />
  To be, or not&#8211;to bop: memoirs / Dizzy Gillespie, with Al Fraser.<br />
</em></p>
<p>which we have described in our <a href="http://lccn.loc.gov/84029213">online catalog</a>. The person Dizzy Gillespie is also represented in LC&#8217;s name authority file with the <a href="http://www.loc.gov/marc/lccn.html">Library of Congress Control Number</a> n50033872, and the <a href="http://orlabs.oclc.org/viaf/LC|n50033872 ">Linked Authority File at OCLC</a>. And perhaps this picture of Dizzie Gillespie in American Memory will find it&#8217;s way into the <a href="http://memory.loc.gov/pnp/van/5a52000/5a52000/5a52027r.jpg">World Digital Library</a> application that is currently being built. How can we practically and explicitly identify and then represent the relationships between these resources? Is it even possible?</p>
<p>The Linked Data Planet conference is a two day workshop describing how to use traditional web technologies in conjunction with semantic web technologies (RDF, OWL, SPARQL, RDFa and GRDDL) to enable this sort of linking of resources inside particular applications, within the enterprise and around the world. My hope is that the conference will provide guidance on simple things LC can do with web technologies that have been in use for 20 years, to model the relationships between digital resources at the Library of Congress.
</p></blockquote>
<p>Hopefully that will convince them :-)</p>
<p><em>Apologies to <a href="http://en.wikipedia.org/wiki/Justify_My_Love">Madonna</a> for the blog post title&#8230;</em></p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/05/29/justify-my-links/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>tripleshot</title>
		<link>http://inkdroid.org/journal/2008/01/11/tripleshot/</link>
		<comments>http://inkdroid.org/journal/2008/01/11/tripleshot/#comments</comments>
		<pubDate>Fri, 11 Jan 2008 16:39:35 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[semweb]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[marc]]></category>
		<category><![CDATA[mod]]></category>
		<category><![CDATA[owl]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[simile]]></category>
		<category><![CDATA[skos]]></category>

		<guid isPermaLink="false">http://www.inkdroid.org/journal/2008/01/11/tripleshot/</guid>
		<description><![CDATA[Recently there was a bit of interesting news around a MARBI Discussion Paper 2008-DP04 regarding semweb technologies at LC. Related to this work are RDF/OWL representations and models for MODS and MARC, which we are also developing. Several representations of MODS in RDF/OWL, such as the one from the SIMILE project, have been made available [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Image:Linea_doubleespresso.jpg"><img src="/images/espresso.jpg" style="width: 150px; margin-left: 10px; float: right; border: none;" /></a></p>
<p>Recently there was a bit of interesting <a href="http://listserv.loc.gov/cgi-bin/wa?A2=ind0801&#038;L=marc&#038;T=0&#038;P=1470">news</a> around a MARBI Discussion Paper 2008-DP04 regarding semweb technologies at <a href="http://loc.gov">LC</a>. </p>
<blockquote><p>
Related to this work are RDF/OWL representations and models for MODS and MARC, which we are also developing.  Several representations of MODS in RDF/OWL, such as the one from the SIMILE project, have been made available as part of various projects and we have found they useful for our analysis and to inform our design process.  We want to bring them together into one easily downloaded and maintained RDF/OWL file for use in community experimentation with RDF applications.  Our time line is to have the MODS RDF ready for community comment by June.
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/01/11/tripleshot/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WoGroFuBiCo cloud</title>
		<link>http://inkdroid.org/journal/2008/01/11/wogrofubico-cloud/</link>
		<comments>http://inkdroid.org/journal/2008/01/11/wogrofubico-cloud/#comments</comments>
		<pubDate>Fri, 11 Jan 2008 08:13:24 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[bibliography]]></category>
		<category><![CDATA[cataloging]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[lc]]></category>

		<guid isPermaLink="false">http://www.inkdroid.org/journal/2008/01/11/wogrofubico-cloud/</guid>
		<description><![CDATA[access accessible addition al american analysis application applications appropriate archives areas association authority available based benefit benefits bibliographic broad broader catalog catalogers cataloging catalogs cataloguing chain change changes classification code collaboration collections committee communities community congress consequences consider considered content continue control controlled cooperative cost costs create created creating creation current data databases dc description [...]]]></description>
			<content:encoded><![CDATA[<style>
<!--
span.tagcloud0 { font-size: 1.0em; padding: 0em; color: #ACC1F3; z-index: 10;
position: relative}
span.tagcloud0 a {text-decoration: none;  color: #ACC1F3;}
span.tagcloud1 { font-size: 1.4em; padding: 0em; color: #ACC1F3; z-index: 9;
position: relative}
span.tagcloud1 a {text-decoration: none; color: #ACC1F3;}
span.tagcloud2 { font-size: 1.8em; padding: 0em; color: #86A0DC; z-index: 8;
position: relative}
span.tagcloud2 a {text-decoration: none; color: #86A0DC;}
span.tagcloud3 { font-size: 2.2em; padding: 0em; color: #86A0DC; z-index: 7;
position: relative}
span.tagcloud3 a {text-decoration: none; color: #86A0DC;}
span.tagcloud4 { font-size: 2.6em; padding: 0em; color: #607EC5; z-index: 6;
position: relative}
span.tagcloud4 a {text-decoration: none; color: #607EC5;}
span.tagcloud5 { font-size: 3.0em; padding: 0em; color: #607EC5; z-index: 5;
position: relative}
span.tagcloud5 a {text-decoration: none; color: #607EC5;}
span.tagcloud6 { font-size: 3.3em; padding: 0em; color: #4C6DB9; z-index: 4;
position: relative}
span.tagcloud6 a {text-decoration: none; color: #4C6DB9;}
span.tagcloud7 { font-size: 3.6em; padding: 0em; color: #395CAE; z-index: 3;
position: relative}
span.tagcloud7 a {text-decoration: none; color: #395CAE;}
span.tagcloud8 { font-size: 3.9em; padding: 0em; color: #264CA2; z-index: 2;
position: relative}
span.tagcloud8 a {text-decoration: none; color: #264CA2;}
span.tagcloud9 { font-size: 4.2em; padding: 0em; color: #133B97; z-index: 1;
position: relative}
span.tagcloud9 a {text-decoration: none; color: #133B97;}
span.tagcloud10 { font-size: 4.5em; padding: 0em; color: #002A8B; z-index: 0;
position: relative}
span.tagcloud10 a {text-decoration: none; color: #002A8B;}
//-->
</style>
<p><span class="tagcloud1">access</span> <span class="tagcloud0">accessible</span> <span class="tagcloud0">addition</span> <span class="tagcloud0">al</span> <span class="tagcloud0">american</span> <span class="tagcloud0">analysis</span> <span class="tagcloud0">application</span> <span class="tagcloud0">applications</span> <span class="tagcloud0">appropriate</span> <span class="tagcloud0">archives</span> <span class="tagcloud0">areas</span> <span class="tagcloud0">association</span> <span class="tagcloud1">authority</span> <span class="tagcloud1">available</span> <span class="tagcloud0">based</span> <span class="tagcloud0">benefit</span> <span class="tagcloud0">benefits</span> <span class="tagcloud6">bibliographic</span> <span class="tagcloud0">broad</span> <span class="tagcloud0">broader</span> <span class="tagcloud1">catalog</span> <span class="tagcloud0">catalogers</span> <span class="tagcloud2">cataloging</span> <span class="tagcloud0">catalogs</span> <span class="tagcloud0">cataloguing</span> <span class="tagcloud0">chain</span> <span class="tagcloud0">change</span> <span class="tagcloud0">changes</span> <span class="tagcloud0">classification</span> <span class="tagcloud0">code</span> <span class="tagcloud0">collaboration</span> <span class="tagcloud1">collections</span> <span class="tagcloud0">committee</span> <span class="tagcloud1">communities</span> <span class="tagcloud2">community</span> <span class="tagcloud2">congress</span> <span class="tagcloud0">consequences</span> <span class="tagcloud0">consider</span> <span class="tagcloud0">considered</span> <span class="tagcloud0">content</span> <span class="tagcloud0">continue</span> <span class="tagcloud3">control</span> <span class="tagcloud0">controlled</span> <span class="tagcloud1">cooperative</span> <span class="tagcloud0">cost</span> <span class="tagcloud0">costs</span> <span class="tagcloud0">create</span> <span class="tagcloud0">created</span> <span class="tagcloud0">creating</span> <span class="tagcloud1">creation</span> <span class="tagcloud1">current</span> <span class="tagcloud4">data</span> <span class="tagcloud0">databases</span> <span class="tagcloud0">dc</span> <span class="tagcloud0">description</span> <span class="tagcloud0">descriptive</span> <span class="tagcloud0">desired</span> <span class="tagcloud1">develop</span> <span class="tagcloud0">developed</span> <span class="tagcloud1">development</span> <span class="tagcloud0">different</span> <span class="tagcloud1">digital</span> <span class="tagcloud1">discovery</span> <span class="tagcloud0">distribution</span> <span class="tagcloud0">dublin</span> <span class="tagcloud0">ed</span> <span class="tagcloud0">education</span> <span class="tagcloud0">effort</span> <span class="tagcloud0">encourage</span> <span class="tagcloud0">enhance</span> <span class="tagcloud1">environment</span> <span class="tagcloud0">et</span> <span class="tagcloud0">evidence</span> <span class="tagcloud0">exchange</span> <span class="tagcloud0">exist</span> <span class="tagcloud1">findings</span> <span class="tagcloud0">focus</span> <span class="tagcloud0">format</span> <span class="tagcloud0">formats</span> <span class="tagcloud0">frameworks</span> <span class="tagcloud1">frbr</span> <span class="tagcloud1">future</span> <span class="tagcloud0">greater</span> <span class="tagcloud1">group</span> <span class="tagcloud1">headings</span> <span class="tagcloud0">hidden</span> <span class="tagcloud0">identifiers</span> <span class="tagcloud0">identify</span> <span class="tagcloud0">ifla</span> <span class="tagcloud0">impact</span> <span class="tagcloud0">include</span> <span class="tagcloud1">including</span> <span class="tagcloud0">increase</span> <span class="tagcloud0">increasingly</span> <span class="tagcloud2">information</span> <span class="tagcloud0">institution</span> <span class="tagcloud0">institutions</span> <span class="tagcloud1">international</span> <span class="tagcloud0">knowledge</span> <span class="tagcloud0">language</span> <span class="tagcloud3">lc</span> <span class="tagcloud0">lcs</span> <span class="tagcloud1">lcsh</span> <span class="tagcloud10">libraries</span> <span class="tagcloud0">limited</span> <span class="tagcloud0">lis</span> <span class="tagcloud0">maintaining</span> <span class="tagcloud1">make</span> <span class="tagcloud1">management</span> <span class="tagcloud0">marc</span> <span class="tagcloud2">materials</span> <span class="tagcloud1">metadata</span> <span class="tagcloud1">model</span> <span class="tagcloud1">national</span> <span class="tagcloud1">need</span> <span class="tagcloud1">needs</span> <span class="tagcloud0">networks</span> <span class="tagcloud1">new</span> <span class="tagcloud0">number</span> <span class="tagcloud1">oclc</span> <span class="tagcloud0">online</span> <span class="tagcloud0">organization</span> <span class="tagcloud0">organizations</span> <span class="tagcloud0">outcomes</span> <span class="tagcloud0">outside</span> <span class="tagcloud1">participants</span> <span class="tagcloud0">particular</span> <span class="tagcloud1">pcc</span> <span class="tagcloud0">possible</span> <span class="tagcloud0">potential</span> <span class="tagcloud0">practice</span> <span class="tagcloud1">practices</span> <span class="tagcloud0">primary</span> <span class="tagcloud0">principles</span> <span class="tagcloud0">process</span> <span class="tagcloud0">processes</span> <span class="tagcloud0">production</span> <span class="tagcloud1">program</span> <span class="tagcloud0">programs</span> <span class="tagcloud1">provide</span> <span class="tagcloud0">public</span> <span class="tagcloud0">publishers</span> <span class="tagcloud0">quo</span> <span class="tagcloud0">range</span> <span class="tagcloud1">rare</span> <span class="tagcloud1">rda</span> <span class="tagcloud1">recommendations</span> <span class="tagcloud4">records</span> <span class="tagcloud0">reference</span> <span class="tagcloud0">relationships</span> <span class="tagcloud1">report</span> <span class="tagcloud0">require</span> <span class="tagcloud0">requirements</span> <span class="tagcloud1">research</span> <span class="tagcloud1">resource</span> <span class="tagcloud1">resources</span> <span class="tagcloud0">responsibility</span> <span class="tagcloud0">results</span> <span class="tagcloud0">role</span> <span class="tagcloud1">rules</span> <span class="tagcloud0">search</span> <span class="tagcloud0">serve</span> <span class="tagcloud0">service</span> <span class="tagcloud1">services</span> <span class="tagcloud0">share</span> <span class="tagcloud0">shared</span> <span class="tagcloud1">sharing</span> <span class="tagcloud0">sources</span> <span class="tagcloud1">special</span> <span class="tagcloud0">specific</span> <span class="tagcloud2">standards</span> <span class="tagcloud0">states</span> <span class="tagcloud0">status</span> <span class="tagcloud2">subject</span> <span class="tagcloud0">supply</span> <span class="tagcloud0">support</span> <span class="tagcloud1">systems</span> <span class="tagcloud0">technology</span> <span class="tagcloud0">terms</span> <span class="tagcloud1">time</span> <span class="tagcloud0">today</span> <span class="tagcloud0">tools</span> <span class="tagcloud0">types</span> <span class="tagcloud1">unique</span> <span class="tagcloud0">united</span> <span class="tagcloud0">university</span> <span class="tagcloud2">use</span> <span class="tagcloud1">used</span> <span class="tagcloud2">users</span> <span class="tagcloud0">using</span> <span class="tagcloud1">value</span> <span class="tagcloud0">variety</span> <span class="tagcloud0">various</span> <span class="tagcloud0">vendors</span> <span class="tagcloud0">vocabularies</span> <span class="tagcloud0">washington</span> <span class="tagcloud0">ways</span> <span class="tagcloud1">web</span> <span class="tagcloud1">working</span> <span class="tagcloud2">works</span></p>
<p>same stats as <a href="/journal/2008/01/10/wogrofubico-wc/">before</a>, but the top 200 this time, and as a cloud. It&#8217;s crying out for some kind of stemming to collapse some terms together I suppose&#8230;but it&#8217;s also 3:17AM.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/01/11/wogrofubico-cloud/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>WoGroFuBiCo wc</title>
		<link>http://inkdroid.org/journal/2008/01/10/wogrofubico-wc/</link>
		<comments>http://inkdroid.org/journal/2008/01/10/wogrofubico-wc/#comments</comments>
		<pubDate>Fri, 11 Jan 2008 04:21:50 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[libraries]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[bibliography]]></category>
		<category><![CDATA[cataloging]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[word]]></category>

		<guid isPermaLink="false">http://www.inkdroid.org/journal/2008/01/10/wugrofubico-wc/</guid>
		<description><![CDATA[word count library 263 bibliographic 236 data 170 libraries 144 lc 127 control 109 information 98 cataloging 91 records 88 subject 82 materials 81 standards 81 use 80 congress 79 work 76 record 73 community 67 users 61 working 59 group 58 access 57 recommendations 56 resources 53 authority 52 metadata 47 future 46 new [...]]]></description>
			<content:encoded><![CDATA[<p></p>
<table cellspacing="0" cellpadding="3" width="30%">
<tr>
<th width="25%">word</th>
<th>count</th>
</tr>
<tr style="background: #FFFFCC;">
<td>library</td>
<td>263</td>
</tr>
<td>bibliographic</td>
<td>236</td>
<tr style="background: #FFFFCC;">
<td>data</td>
<td>170</td>
</tr>
<td>libraries</td>
<td>144</td>
<tr style="background: #FFFFCC;">
<td>lc</td>
<td>127</td>
</tr>
<td>control</td>
<td>109</td>
<tr style="background: #FFFFCC;">
<td>information</td>
<td>98</td>
</tr>
<td>cataloging</td>
<td>91</td>
<tr style="background: #FFFFCC;">
<td>records</td>
<td>88</td>
</tr>
<td>subject</td>
<td>82</td>
<tr style="background: #FFFFCC;">
<td>materials</td>
<td>81</td>
</tr>
<td>standards</td>
<td>81</td>
<tr style="background: #FFFFCC;">
<td>use</td>
<td>80</td>
</tr>
<td>congress</td>
<td>79</td>
<tr style="background: #FFFFCC;">
<td>work</td>
<td>76</td>
</tr>
<td>record</td>
<td>73</td>
<tr style="background: #FFFFCC;">
<td>community</td>
<td>67</td>
</tr>
<td>users</td>
<td>61</td>
<tr style="background: #FFFFCC;">
<td>working</td>
<td>59</td>
</tr>
<td>group</td>
<td>58</td>
<tr style="background: #FFFFCC;">
<td>access</td>
<td>57</td>
</tr>
<td>recommendations</td>
<td>56</td>
<tr style="background: #FFFFCC;">
<td>resources</td>
<td>53</td>
</tr>
<td>authority</td>
<td>52</td>
<tr style="background: #FFFFCC;">
<td>metadata</td>
<td>47</td>
</tr>
<td>future</td>
<td>46</td>
<tr style="background: #FFFFCC;">
<td>new</td>
<td>40</td>
</tr>
<td>environment</td>
<td>37</td>
<tr style="background: #FFFFCC;">
<td>development</td>
<td>37</td>
</tr>
<td>web</td>
<td>36</td>
<tr style="background: #FFFFCC;">
<td>collections</td>
<td>35</td>
</tr>
<td>systems</td>
<td>35</td>
<tr style="background: #FFFFCC;">
<td>available</td>
<td>35</td>
</tr>
<td>creation</td>
<td>35</td>
<tr style="background: #FFFFCC;">
<td>services</td>
<td>34</td>
</tr>
<td>headings</td>
<td>32</td>
<tr style="background: #FFFFCC;">
<td>national</td>
<td>31</td>
</tr>
<td>findings</td>
<td>30</td>
<tr style="background: #FFFFCC;">
<td>research</td>
<td>30</td>
</tr>
<td>unique</td>
<td>29</td>
<tr style="background: #FFFFCC;">
<td>sharing</td>
<td>29</td>
</tr>
<td>oclc</td>
<td>28</td>
<tr style="background: #FFFFCC;">
<td>model</td>
<td>28</td>
</tr>
<td>catalog</td>
<td>28</td>
<tr style="background: #FFFFCC;">
<td>international</td>
<td>27</td>
</tr>
<td>develop</td>
<td>27</td>
<tr style="background: #FFFFCC;">
<td>value</td>
<td>27</td>
</tr>
<td>lcsh</td>
<td>26</td>
<tr style="background: #FFFFCC;">
<td>pcc</td>
<td>26</td>
</tr>
<td>user</td>
<td>26</td>
<tr style="background: #FFFFCC;">
<td>need</td>
<td>26</td>
</tr>
<td>report</td>
<td>25</td>
<tr style="background: #FFFFCC;">
<td>make</td>
<td>25</td>
</tr>
<td>practices</td>
<td>25</td>
<tr style="background: #FFFFCC;">
<td>rda</td>
<td>25</td>
</tr>
<td>used</td>
<td>25</td>
<tr style="background: #FFFFCC;">
<td>time</td>
<td>24</td>
</tr>
<td>needs</td>
<td>24</td>
<tr style="background: #FFFFCC;">
<td>rare</td>
<td>24</td>
</tr>
<td>including</td>
<td>24</td>
<tr style="background: #FFFFCC;">
<td>provide</td>
<td>23</td>
</tr>
<td>discovery</td>
<td>23</td>
<tr style="background: #FFFFCC;">
<td>communities</td>
<td>23</td>
</tr>
<td>special</td>
<td>23</td>
<tr style="background: #FFFFCC;">
<td>frbr</td>
<td>23</td>
</tr>
<td>current</td>
<td>22</td>
<tr style="background: #FFFFCC;">
<td>resource</td>
<td>22</td>
</tr>
<td>rules</td>
<td>22</td>
<tr style="background: #FFFFCC;">
<td>digital</td>
<td>21</td>
</tr>
<td>cooperative</td>
<td>21</td>
<tr style="background: #FFFFCC;">
<td>program</td>
<td>21</td>
</tr>
<td>participants</td>
<td>21</td>
<tr style="background: #FFFFCC;">
<td>management</td>
<td>21</td>
</tr>
<td>service</td>
<td>20</td>
<tr style="background: #FFFFCC;">
<td>dc</td>
<td>20</td>
</tr>
<td>programs</td>
<td>20</td>
<tr style="background: #FFFFCC;">
<td>online</td>
<td>20</td>
</tr>
<td>costs</td>
<td>20</td>
<tr style="background: #FFFFCC;">
<td>washington</td>
<td>20</td>
</tr>
<td>standard</td>
<td>19</td>
<tr style="background: #FFFFCC;">
<td>support</td>
<td>19</td>
</tr>
<td>knowledge</td>
<td>19</td>
<tr style="background: #FFFFCC;">
<td>different</td>
<td>19</td>
</tr>
<td>appropriate</td>
<td>19</td>
<tr style="background: #FFFFCC;">
<td>effort</td>
<td>18</td>
</tr>
<td>applications</td>
<td>18</td>
<tr style="background: #FFFFCC;">
<td>marc</td>
<td>18</td>
</tr>
<td>shared</td>
<td>18</td>
<tr style="background: #FFFFCC;">
<td>exchange</td>
<td>18</td>
</tr>
<td>process</td>
<td>18</td>
<tr style="background: #FFFFCC;">
<td>changes</td>
<td>17</td>
</tr>
<td>lcs</td>
<td>17</td>
<tr style="background: #FFFFCC;">
<td>increase</td>
<td>16</td>
</tr>
<td>public</td>
<td>16</td>
<tr style="background: #FFFFCC;">
<td>search</td>
<td>16</td>
</tr>
<td>creating</td>
<td>16</td>
<tr style="background: #FFFFCC;">
<td>broader</td>
<td>16</td>
</tr>
<td>catalogs</td>
<td>16</td>
<tr style="background: #FFFFCC;">
<td>controlled</td>
<td>16</td>
</tr>
</table>
<p>I converted the <a href="http://www.loc.gov/bibliographic-future/news/lcwg-ontherecord-jan08-final.pdf">pdf</a> to text file called &#8216;lc&#8217; with <a href="http://www.foolabs.com/xpdf/">xpdf</a> and then wrote a little python:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">urllib</span> <span style="color: #ff7700;font-weight:bold;">import</span> urlopen
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">re</span> <span style="color: #ff7700;font-weight:bold;">import</span> sub
&nbsp;
stop_words = urlopen<span style="color: black;">&#40;</span><span style="color: #483d8b;">'http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words'</span><span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
text = <span style="color: #008000;">file</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'lc'</span><span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
counts = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> text.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    word = word.<span style="color: black;">lower</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    word = sub<span style="color: black;">&#40;</span>r<span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\W</span>'</span>, <span style="color: #483d8b;">''</span>, word<span style="color: black;">&#41;</span>
    word = sub<span style="color: black;">&#40;</span>r<span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\d</span>+'</span>, <span style="color: #483d8b;">''</span>, word<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> word == <span style="color: #483d8b;">''</span>  <span style="color: #ff7700;font-weight:bold;">or</span> word <span style="color: #ff7700;font-weight:bold;">in</span> stop_words: <span style="color: #ff7700;font-weight:bold;">continue</span>
    counts<span style="color: black;">&#91;</span>word<span style="color: black;">&#93;</span> = counts.<span style="color: black;">get</span><span style="color: black;">&#40;</span>word,<span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span> + <span style="color: #ff4500;">1</span>
&nbsp;
words = counts.<span style="color: black;">keys</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
words.<span style="color: black;">sort</span><span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> a,b: <span style="color: #008000;">cmp</span><span style="color: black;">&#40;</span>counts<span style="color: black;">&#91;</span>b<span style="color: black;">&#93;</span>, counts<span style="color: black;">&#91;</span>a<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> words<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>:<span style="color: #ff4500;">100</span><span style="color: black;">&#93;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;%20s %i&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>word, counts<span style="color: black;">&#91;</span>word<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Does me writing code to read the report count as reading the report? &#8230; </p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2008/01/10/wogrofubico-wc/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>metadata hackers</title>
		<link>http://inkdroid.org/journal/2007/12/31/metadata-hackers/</link>
		<comments>http://inkdroid.org/journal/2007/12/31/metadata-hackers/#comments</comments>
		<pubDate>Mon, 31 Dec 2007 14:42:45 +0000</pubDate>
		<dc:creator>ed</dc:creator>
				<category><![CDATA[marc]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[lc]]></category>
		<category><![CDATA[nsa]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://www.inkdroid.org/journal/2007/12/31/metadata-hackers/</guid>
		<description><![CDATA[I opened the paper this morning to read a story of another person involved in the creation of MARC who has just died. I hadn&#8217;t realized before reading Henrietta Avram and Samuel Snyder&#8217;s obituaries that there was a bit of an NSA LC connection when MARC was being created. From 1964 to 1966, [Samuel Snyder] [...]]]></description>
			<content:encoded><![CDATA[<p>I opened the paper this morning to read a <a href="http://www.washingtonpost.com/wp-dyn/content/article/2007/12/30/AR2007123002435.html">story </a> of another person involved in the creation of MARC who has just died. I hadn&#8217;t realized before reading <a href="http://www.washingtonpost.com/wp-dyn/content/article/2006/04/27/AR2006042702105.html">Henrietta Avram</a> and <a href="http://www.washingtonpost.com/wp-dyn/content/article/2007/12/30/AR2007123002435.html">Samuel Snyder&#8217;s</a> obituaries that there was a bit of an <a href="http://nsa.gov">NSA</a> <a href="http://loc.gov">LC</a> connection when MARC was being created.</p>
<blockquote><p>
From 1964 to 1966, [Samuel Snyder] was coordinator of the Library of Congress&#8217;s information systems office. He was among the creators of the library&#8217;s Machine Readable Cataloging system that replaced the handwritten card with an electronic searchable database system that became the standard worldwide.
</p></blockquote>
<p>I imagine NSA folks had a lot to do with early automation efforts in the federal government&#8230;but it&#8217;s still an interesting connection. One of my <a href="http://onebiglibrary.net">coworkers</a> is reading up on this early history of MARC so this is for him in the unlikely event that he missed it&#8230;email would probably have worked better I guess, but I also wanted to pay tribute. Libraries wouldn&#8217;t be what they are today without this influential early work.</p>
]]></content:encoded>
			<wfw:commentRss>http://inkdroid.org/journal/2007/12/31/metadata-hackers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
