Skip to content

Tag Archives: crawling

the importance of being crawled

While lcsh.info was up and running harvesters actively crawled it. At its core all lcsh.info did was mint a URI for every Library of Congress Subject Heading. This is similar in spirit to Brewster Kahle’s more ambitious OpenLibrary project to mint a URI for every book, or in his words:

One web page for every book

Aside: [...]

crawling bibliographic data

Today’s Guardian article Why you can’t find a library book in your search engine prompted me to look at Worldcat’s robots.txt file for the first time. Part of the beauty of the web is that it’s an open information space where anyone (people and robots) can start with a single URL and follow their nose [...]