<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: crawling bibliographic data</title>
	<atom:link href="http://inkdroid.org/journal/2009/01/22/crawling-bibliographic-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://inkdroid.org/journal/2009/01/22/crawling-bibliographic-data/</link>
	<description>$pithy_personal_mission_statement</description>
	<lastBuildDate>Wed, 10 Mar 2010 02:04:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: teetsm</title>
		<link>http://inkdroid.org/journal/2009/01/22/crawling-bibliographic-data/comment-page-1/#comment-80868</link>
		<dc:creator>teetsm</dc:creator>
		<pubDate>Fri, 20 Feb 2009 19:41:26 +0000</pubDate>
		<guid isPermaLink="false">http://inkdroid.org/journal/?p=663#comment-80868</guid>
		<description>Ed,

I wanted to jump in and clarify something...  You are absolutely right in your description of robots.txt and potential implications.  What is not obvious from looking at our robots.txt is that we have alternative methods in place with the major engines.  When our relationships with the engines started many years ago, it was necessary to put special feeds to accommodate the size and structure of Worldcat.  We worked directly with them to provide the data in a way that worked best for their services.  There are aspects such as our robots.txt and no-follows that are remnants of their requests in previous years and prevented multiple feeds from conflicting within their environments.  With advances and changes within the engines and our services, much of what we did then can now be accomplished more standard methods.  You will see some changes such as what Th suggests above, as well as others over the next while as we transition away from the older models with the search engine partners.

Mike Teets, OCLC</description>
		<content:encoded><![CDATA[<p>Ed,</p>
<p>I wanted to jump in and clarify something&#8230;  You are absolutely right in your description of robots.txt and potential implications.  What is not obvious from looking at our robots.txt is that we have alternative methods in place with the major engines.  When our relationships with the engines started many years ago, it was necessary to put special feeds to accommodate the size and structure of Worldcat.  We worked directly with them to provide the data in a way that worked best for their services.  There are aspects such as our robots.txt and no-follows that are remnants of their requests in previous years and prevented multiple feeds from conflicting within their environments.  With advances and changes within the engines and our services, much of what we did then can now be accomplished more standard methods.  You will see some changes such as what Th suggests above, as well as others over the next while as we transition away from the older models with the search engine partners.</p>
<p>Mike Teets, OCLC</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Thom</title>
		<link>http://inkdroid.org/journal/2009/01/22/crawling-bibliographic-data/comment-page-1/#comment-80852</link>
		<dc:creator>Thom</dc:creator>
		<pubDate>Fri, 13 Feb 2009 21:54:03 +0000</pubDate>
		<guid isPermaLink="false">http://inkdroid.org/journal/?p=663#comment-80852</guid>
		<description>Actually, we notice the major crawlers and send them HTML rather than the raw XML.  We do this if we think you are on a mobile device too, since most the the current browsers on phones don&#039;t handle the XML+XSLT very well.

We&#039;ve also turned off the &#039;do not follow&#039; header to make harvesting of Identities work better.

--Th</description>
		<content:encoded><![CDATA[<p>Actually, we notice the major crawlers and send them HTML rather than the raw XML.  We do this if we think you are on a mobile device too, since most the the current browsers on phones don&#8217;t handle the XML+XSLT very well.</p>
<p>We&#8217;ve also turned off the &#8216;do not follow&#8217; header to make harvesting of Identities work better.</p>
<p>&#8211;Th</p>
]]></content:encoded>
	</item>
</channel>
</rss>
