Skip to content

Category Archives: libraries

MARCetplace

Last Saturday I passed the time while waiting in line at the DMV by reading the recently released Study of the North American MARC Records Marketplace. The analysis of the survey results seem to focus on the role of the Library of Congress in the marketplace, which is understandable given that LC funded the report. [...]

cloaking and fulltext

It’s comforting to know that California Digital Library are selectively serving up fulltext content in HTML from their institutional repository for search engines to chew on. For example, compare the output of:

curl http://escholarship.org/uc/item/2896686x

with:

curl –header “User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html)” http://escholarship.org/uc/item/2896686x

You should see full-text content for the article in the latter and not in the former:


qt2896686x repo “Wholly [...]

Documents

I’ve struggled in the past with what constitutes an Information Resource in the context of Web Architecture, Linked Data and practical digital library applications such as the National Digital Newspaper Project I work on at the Library of Congress. So it was reassuring to see the issue come up a few months ago during a [...]

open to view

I spent an hour checking out the HathiTrust API docs this morning; mainly to see what the similarities and differences are with the as-of-yet undocumented API for Chronicling America. There are quite a few similarities in the general RESTful approach, and the use of Atom, METS and PREMIS in the metadata that is made available. [...]

flickr, digital curation and the web

The Library of Congress has started to put selected content from Chronicling America into Flickr as part of the Illustrated Newspaper Supplements set. More details on the rationale and process involved can be found in a FAQ on the LC Newspapers and Current Periodical Reading Room website.
So for example this newspaper page on Chronicling [...]

American Memory is (almost) 20

Through an internal discussion list at the Library of Congress I learned that this year will mark the 20th Anniversary of American Memory. The exact date of the anniversary depends on how you want to mark it: either the beginning of FY90 on October 1st, 1999 1989 (thanks David) when work officially began, or earlier [...]

APIs Suck

With TransparencyCamp last weekend, news of the mandated use of feed syndication by Federal Agencies receiving funds from the Recovery Act, recent blog posts by Tim O’Reilly and the Special Libraries Association, an article in Newsweek, news of Carl Malamud’s bid to become the Public Printer of the United States (aka head of the GPO), [...]

c4l09

So code4lib2009 was a whole lot of fun. The amazing thing about the conference isn’t really reflected in the program of talks. I feel like I can say that since I was one of them.
The real value is the social space and the time to talk to people you’ve seen online, throw around ideas, [...]

crawling bibliographic data

Today’s Guardian article Why you can’t find a library book in your search engine prompted me to look at Worldcat’s robots.txt file for the first time. Part of the beauty of the web is that it’s an open information space where anyone (people and robots) can start with a single URL and follow their nose [...]

work identifiers and the web

Michael Smethurst’s In Search of Cultural Identifiers post over at the BBC Radio Labs got me thinking about web identifiers for works, about LibraryThing and OCLC as linked library data providers, and finally about the International Standard Text Code. Admittedly it’s kind of a hodge-podge of topics, and I’m going to taking some liberties with [...]