<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: pymarc, marc8 and nothingness</title>
	<atom:link href="http://inkdroid.org/journal/2007/07/20/pymarc-marc8/feed/" rel="self" type="application/rss+xml" />
	<link>http://inkdroid.org/journal/2007/07/20/pymarc-marc8/</link>
	<description>$pithy_personal_mission_statement</description>
	<lastBuildDate>Wed, 10 Mar 2010 02:04:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Nodalities &#187; Blog Archive &#187; This Week&#8217;s Semantic Web</title>
		<link>http://inkdroid.org/journal/2007/07/20/pymarc-marc8/comment-page-1/#comment-59652</link>
		<dc:creator>Nodalities &#187; Blog Archive &#187; This Week&#8217;s Semantic Web</dc:creator>
		<pubDate>Thu, 17 Apr 2008 17:05:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.inkdroid.org/journal/2007/07/20/pymarc-marc8/#comment-59652</guid>
		<description>[...] pymarc, marc8 and nothingness - new function, marc8_to_unicode() [...]</description>
		<content:encoded><![CDATA[<p>[...] pymarc, marc8 and nothingness &#8211; new function, marc8_to_unicode() [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ed</title>
		<link>http://inkdroid.org/journal/2007/07/20/pymarc-marc8/comment-page-1/#comment-53141</link>
		<dc:creator>ed</dc:creator>
		<pubDate>Thu, 31 Jan 2008 03:13:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.inkdroid.org/journal/2007/07/20/pymarc-marc8/#comment-53141</guid>
		<description>Wow, sorry for the delay gabe, this slipped my radar -- &lt;a href=&quot;http://pypi.python.org/pypi/pymarc&quot; rel=&quot;nofollow&quot;&gt;1.7&lt;/a&gt; was just released with some fixes you just recently sent to me in a patch. 

In addition I documented the marc8_to_unicode function more so that it hopefully makes clear you aren&#039;t supposed to pass in a serialized marc record, but just a chunk of text extracted from the record that you&#039;d like to translate.

Thanks for the info and the patch!</description>
		<content:encoded><![CDATA[<p>Wow, sorry for the delay gabe, this slipped my radar &#8212; <a href="http://pypi.python.org/pypi/pymarc" rel="nofollow">1.7</a> was just released with some fixes you just recently sent to me in a patch. </p>
<p>In addition I documented the marc8_to_unicode function more so that it hopefully makes clear you aren&#8217;t supposed to pass in a serialized marc record, but just a chunk of text extracted from the record that you&#8217;d like to translate.</p>
<p>Thanks for the info and the patch!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: thesecretmirror.com &#187; Blog Archive &#187; When Life Hands You MARC, make pymarc</title>
		<link>http://inkdroid.org/journal/2007/07/20/pymarc-marc8/comment-page-1/#comment-39168</link>
		<dc:creator>thesecretmirror.com &#187; Blog Archive &#187; When Life Hands You MARC, make pymarc</dc:creator>
		<pubDate>Fri, 20 Jul 2007 22:13:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.inkdroid.org/journal/2007/07/20/pymarc-marc8/#comment-39168</guid>
		<description>[...] I&#8217;m no expert, but I&#8217;m glad that I could help bring pymarc up to version 1.0 and that I&#8217;ve had a chance to begin enjoy programming again. I&#8217;m also glad to see that Catalogablog has spread the word. Download a copy and start hacking; maybe you&#8217;ll be rewarded with rediscovering the joy of code like I was. [...]</description>
		<content:encoded><![CDATA[<p>[...] I&#8217;m no expert, but I&#8217;m glad that I could help bring pymarc up to version 1.0 and that I&#8217;ve had a chance to begin enjoy programming again. I&#8217;m also glad to see that Catalogablog has spread the word. Download a copy and start hacking; maybe you&#8217;ll be rewarded with rediscovering the joy of code like I was. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabriel Farrell</title>
		<link>http://inkdroid.org/journal/2007/07/20/pymarc-marc8/comment-page-1/#comment-39150</link>
		<dc:creator>Gabriel Farrell</dc:creator>
		<pubDate>Fri, 20 Jul 2007 17:58:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.inkdroid.org/journal/2007/07/20/pymarc-marc8/#comment-39150</guid>
		<description>Great things, Ed (and thanks to Mark and Aaron, too).  I svn&#039;d and tried it out on a dump I had lying around of 105,000 of our records.  Following the pymarc.__doc__, I did:

&gt;&gt;&gt; reader = MARCReader(&#039;bibs105000.out&#039;)
&gt;&gt;&gt; for record in reader:
...     print record[&#039;245&#039;][&#039;a&#039;]
... 
Traceback (most recent call last):
  File &quot;&quot;, line 1, in ?
  File &quot;build/bdist.linux-x86_64/egg/pymarc/reader.py&quot;, line 51, in next
ValueError: invalid literal for int(): bibs1
&gt;&gt;&gt; 

But I noticed &quot;self.reader = pymarc.MARCReader(file(&#039;test/test.dat&#039;))&quot; in test/reader.py, so I:

&gt;&gt;&gt; marc8_file = file(&#039;bibs105000.out&#039;)
&gt;&gt;&gt; reader = MARCReader(marc8_file)
&gt;&gt;&gt; for record in reader:
...     print record[&#039;245&#039;][&#039;a&#039;]
... 
Microeconomics :
The multilateral development banks :
The Accountants digest.
Achievement.
ALA bulletin.
Acta arithmetica.
Acta crystallographica.
ASLE transactions.
Acta mathematica.
Acta mechanica.
Acta physica Polonica,
Acta physica Austriaca /
Acta physica Polonica.
Acta polytechnica Scandinavica.
[etc. -- big old list of titles, streaming by, very fast]

So MARCReader expects a file object, not a filename.  Does that doc string need updating or did I misread it?

I then tried to test marc8_to_unicode:

&gt;&gt;&gt; utf8_file = marc8_to_unicode(marc8_file)
Traceback (most recent call last):
  File &quot;&quot;, line 1, in ?
  File &quot;build/bdist.linux-x86_64/egg/pymarc/marc8.py&quot;, line 8, in marc8_to_unicode
  File &quot;build/bdist.linux-x86_64/egg/pymarc/marc8.py&quot;, line 43, in translate
TypeError: len() of unsized object
&gt;&gt;&gt; 

Looking at test/marc8.py I saw marc8_to_unicode expects a string, so I:

&gt;&gt;&gt; marc8_file = file(&#039;bibs105000.out&#039;)
&gt;&gt;&gt; marc8_file_str = marc8_file.read()
&gt;&gt;&gt; utf8_file_str = marc8_to_unicode(marc8_file_str)
couldn&#039;t find 66 69 221 221
couldn&#039;t find 66 69 220 220
couldn&#039;t find 66 69 221 221
couldn&#039;t find 66 69 220 220
couldn&#039;t find 66 69 221 221
couldn&#039;t find 66 69 221 221
couldn&#039;t find 66 69 220 220
couldn&#039;t find 66 69 220 220
couldn&#039;t find 66 69 221 221
&gt;&gt;&gt; 

This took a long time, running at about 98% of my cpu and 20% of my ~4GB of  memory.  So I timed it (only 3 times because, you know, I&#039;m impatient):

&gt;&gt;&gt; t = Timer(stmt=&#039;utf8_file_str = marc8_to_unicode(marc8_file_str)&#039;, setup=&#039;marc8_file = file(&quot;bibs105000.out&quot;); marc8_file_str = marc8_file.read(); from pymarc import marc8_to_unicode&#039;)
&gt;&gt;&gt; t.timeit(3)
[a bunch of the &quot;couldn&#039;t find 66 69 22[01] 22[01]&quot;, three times over]
786.2618350982666

So it took a little while.  How big is the file?

&gt;&gt;&gt; len(marc8_file_str)
93747869
&gt;&gt;&gt; len(utf8_file_str)
87474412

Hmm.  Those should match, right?  Well, let&#039;s see if we can read the new file:

&gt;&gt;&gt; pymarc_utf8_file = file(&#039;bibs105000_pymarc_utf8.dat&#039;, &#039;w&#039;)
&gt;&gt;&gt; pymarc_utf8_file.write(utf8_file_str.encode(&#039;utf8&#039;))
&gt;&gt;&gt; reader = MARCReader(file(&#039;bibs105000_pymarc_utf8.dat&#039;))
&gt;&gt;&gt; for record in reader:
...     print record[&#039;245&#039;][&#039;a&#039;]
... 
None
Traceback (most recent call last):
  File &quot;&quot;, line 1, in ?
  File &quot;build/bdist.linux-x86_64/egg/pymarc/reader.py&quot;, line 54, in next
  File &quot;build/bdist.linux-x86_64/egg/pymarc/record.py&quot;, line 46, in __init__
  File &quot;build/bdist.linux-x86_64/egg/pymarc/record.py&quot;, line 123, in decodeMARC
pymarc.exceptions.BaseAddressInvalid: Base address exceeds size of record
&gt;&gt;&gt; 

Have I done something screwy, or is it something in the file?  I can get the file to you if you want to test on your system.  I also have a utf8 file to compare the output against, produced by `yaz-marcdump -f MARC-8 -t UTF-8 -I bibs105000.out &gt; bibs105000_utf8.out`, and of course the diff is a mile long.  If I get time later I&#039;ll test line by line, as done in test/marc8.py.  

Apologies for the long long comment -- should be on some mailing list or something.</description>
		<content:encoded><![CDATA[<p>Great things, Ed (and thanks to Mark and Aaron, too).  I svn&#8217;d and tried it out on a dump I had lying around of 105,000 of our records.  Following the pymarc.__doc__, I did:</p>
<p>&gt;&gt;&gt; reader = MARCReader(&#8216;bibs105000.out&#8217;)<br />
&gt;&gt;&gt; for record in reader:<br />
&#8230;     print record['245']['a']<br />
&#8230;<br />
Traceback (most recent call last):<br />
  File &#8220;&#8221;, line 1, in ?<br />
  File &#8220;build/bdist.linux-x86_64/egg/pymarc/reader.py&#8221;, line 51, in next<br />
ValueError: invalid literal for int(): bibs1<br />
&gt;&gt;&gt; </p>
<p>But I noticed &#8220;self.reader = pymarc.MARCReader(file(&#8216;test/test.dat&#8217;))&#8221; in test/reader.py, so I:</p>
<p>&gt;&gt;&gt; marc8_file = file(&#8216;bibs105000.out&#8217;)<br />
&gt;&gt;&gt; reader = MARCReader(marc8_file)<br />
&gt;&gt;&gt; for record in reader:<br />
&#8230;     print record['245']['a']<br />
&#8230;<br />
Microeconomics :<br />
The multilateral development banks :<br />
The Accountants digest.<br />
Achievement.<br />
ALA bulletin.<br />
Acta arithmetica.<br />
Acta crystallographica.<br />
ASLE transactions.<br />
Acta mathematica.<br />
Acta mechanica.<br />
Acta physica Polonica,<br />
Acta physica Austriaca /<br />
Acta physica Polonica.<br />
Acta polytechnica Scandinavica.<br />
[etc. -- big old list of titles, streaming by, very fast]</p>
<p>So MARCReader expects a file object, not a filename.  Does that doc string need updating or did I misread it?</p>
<p>I then tried to test marc8_to_unicode:</p>
<p>&gt;&gt;&gt; utf8_file = marc8_to_unicode(marc8_file)<br />
Traceback (most recent call last):<br />
  File &#8220;&#8221;, line 1, in ?<br />
  File &#8220;build/bdist.linux-x86_64/egg/pymarc/marc8.py&#8221;, line 8, in marc8_to_unicode<br />
  File &#8220;build/bdist.linux-x86_64/egg/pymarc/marc8.py&#8221;, line 43, in translate<br />
TypeError: len() of unsized object<br />
&gt;&gt;&gt; </p>
<p>Looking at test/marc8.py I saw marc8_to_unicode expects a string, so I:</p>
<p>&gt;&gt;&gt; marc8_file = file(&#8216;bibs105000.out&#8217;)<br />
&gt;&gt;&gt; marc8_file_str = marc8_file.read()<br />
&gt;&gt;&gt; utf8_file_str = marc8_to_unicode(marc8_file_str)<br />
couldn&#8217;t find 66 69 221 221<br />
couldn&#8217;t find 66 69 220 220<br />
couldn&#8217;t find 66 69 221 221<br />
couldn&#8217;t find 66 69 220 220<br />
couldn&#8217;t find 66 69 221 221<br />
couldn&#8217;t find 66 69 221 221<br />
couldn&#8217;t find 66 69 220 220<br />
couldn&#8217;t find 66 69 220 220<br />
couldn&#8217;t find 66 69 221 221<br />
&gt;&gt;&gt; </p>
<p>This took a long time, running at about 98% of my cpu and 20% of my ~4GB of  memory.  So I timed it (only 3 times because, you know, I&#8217;m impatient):</p>
<p>&gt;&gt;&gt; t = Timer(stmt=&#8217;utf8_file_str = marc8_to_unicode(marc8_file_str)&#8217;, setup=&#8217;marc8_file = file(&#8220;bibs105000.out&#8221;); marc8_file_str = marc8_file.read(); from pymarc import marc8_to_unicode&#8217;)<br />
&gt;&gt;&gt; t.timeit(3)<br />
[a bunch of the "couldn't find 66 69 22[01] 22[01]&#8220;, three times over]<br />
786.2618350982666</p>
<p>So it took a little while.  How big is the file?</p>
<p>&gt;&gt;&gt; len(marc8_file_str)<br />
93747869<br />
&gt;&gt;&gt; len(utf8_file_str)<br />
87474412</p>
<p>Hmm.  Those should match, right?  Well, let&#8217;s see if we can read the new file:</p>
<p>&gt;&gt;&gt; pymarc_utf8_file = file(&#8216;bibs105000_pymarc_utf8.dat&#8217;, &#8216;w&#8217;)<br />
&gt;&gt;&gt; pymarc_utf8_file.write(utf8_file_str.encode(&#8216;utf8&#8242;))<br />
&gt;&gt;&gt; reader = MARCReader(file(&#8216;bibs105000_pymarc_utf8.dat&#8217;))<br />
&gt;&gt;&gt; for record in reader:<br />
&#8230;     print record['245']['a']<br />
&#8230;<br />
None<br />
Traceback (most recent call last):<br />
  File &#8220;&#8221;, line 1, in ?<br />
  File &#8220;build/bdist.linux-x86_64/egg/pymarc/reader.py&#8221;, line 54, in next<br />
  File &#8220;build/bdist.linux-x86_64/egg/pymarc/record.py&#8221;, line 46, in __init__<br />
  File &#8220;build/bdist.linux-x86_64/egg/pymarc/record.py&#8221;, line 123, in decodeMARC<br />
pymarc.exceptions.BaseAddressInvalid: Base address exceeds size of record<br />
&gt;&gt;&gt; </p>
<p>Have I done something screwy, or is it something in the file?  I can get the file to you if you want to test on your system.  I also have a utf8 file to compare the output against, produced by `yaz-marcdump -f MARC-8 -t UTF-8 -I bibs105000.out &gt; bibs105000_utf8.out`, and of course the diff is a mile long.  If I get time later I&#8217;ll test line by line, as done in test/marc8.py.  </p>
<p>Apologies for the long long comment &#8212; should be on some mailing list or something.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
