Over the weekend you probably saw the announcements going around about Google Books releasing +1 million public domain ebooks on the web as epubs. This is great news: epub is a web friendly, open format – and having all this content available as epub is important.

Now I might be greedy, but when I saw that 1 million epubs are available my mind immediately jumps to thinking of getting them, indexing them and whatnot. Then I guiltily justified my greedy thoughts by pondering the conventional digital preservation wisdom that Lots of Copies Keeps Stuff Safe (LOCKSS). The books are in the public domain, so 
. why not?

Google Books has a really nice API, which lets you get back search results as Atom, with lots of links to things like thumbnails, annotations, item views, etc. You also get a nice amount of Dublin Core metadata. And you can limit your search to books published before 1923. For example here’s a search for pre-1923 books that mention “Stevenson” (disclaimer: I don’t think the 1923 limit is actually working):

curl 'http://books.google.com/books/feeds/volumes?tbs=cd_max:Jan%2001_2%201923&q=Stevenson' | xmllint --format -

which yields:

< ?xml version="1.0" encoding="UTF-8"?>

  http://www.google.com/books/feeds/volumes
  2010-08-30T20:37:27.000Z
  
  Search results for Stevenson
  
  
  
  
  
    Google Books Search
    http://www.google.com
  
  Google Book Search data API
  206
  1
  10
  
    http://www.google.com/books/feeds/volumes/ENMWAAAAYAAJ
    2010-08-30T20:37:27.000Z
    
    Kidnapped
    
    
    
    
    
    
    
    
    
    
    Robert Louis Stevenson
    1909
    308 pages
    book
    ENMWAAAAYAAJ
    HARVARD:HN1JZ9
    Kidnapped
    being memoirs of the adventures of David Balfour in the year 1851 ...
  
  
    http://www.google.com/books/feeds/volumes/WZ0vAAAAMAAJ
    2010-08-30T20:37:27.000Z
    
    Treasure Island
    
    
    
    
    
    
    
    
    
    
    Robert Louis Stevenson
    George Edmund Varian
    1918
    CHAPTER I THE OLD SEA DOG AT THE &quot;ADMIRAL BENBOW&quot; SQUIRE Trelawney, Dr. Livesey, 
and the rest of these gentlemen having asked me to write down the whole ...
    306 pages
    book
    WZ0vAAAAMAAJ
    NYPL:33433075793830
    Fiction
    Treasure Island
  
  
    http://www.google.com/books/feeds/volumes/REUrAQAAIAAJ
    2010-08-30T20:37:27.000Z
    
    Stevenson
    
    
    
    
    
    
    
    
    
    
    Adlai Ewing Stevenson
    Grace Darling
    David Darling
    1977-10
    127 pages
    book
    REUrAQAAIAAJ
    STANFORD:36105037014342
    McGraw-Hill/Contemporary
    Biography & Autobiography
    Stevenson
  
  
    http://www.google.com/books/feeds/volumes/3ibdGgAACAAJ
    2010-08-30T20:37:27.000Z
    
    Stevenson
    
    
    
    
    
    
    
    
    
    
    Robert Louis Stevenson
    2007-01-17
    This scarce antiquarian book is included in our special Legacy Reprint Series.
    128 pages
    book
    3ibdGgAACAAJ
    ISBN:1430495375
    ISBN:9781430495376
    Kessinger Pub Co
    Poetry
    Stevenson
    Day by Day
  
  
    http://www.google.com/books/feeds/volumes/3QI-AAAAYAAJ
    2010-08-30T20:37:27.000Z
    
    A child's garden of verses
    
    
    
    
    
    
    
    
    
    
    Robert Louis Stevenson
    1914
    IN winter I get up at night And dress by yellow candle-light. In summer, quite 
the other way, I have to go to bed by day. I have to go to bed and see The ...
    136 pages
    book
    3QI-AAAAYAAJ
    CORNELL:31924052752262
    Children's poetry, Scottish
    A child's garden of verses
    by Robert Louis Stevenson; illustrated by Charles Robinson
  
  
    http://www.google.com/books/feeds/volumes/Gmk-AAAAYAAJ
    2010-08-30T20:37:27.000Z
    
    Travels with a donkey in the Cevennes
    
    
    
    
    
    
    
    
    
    
    Robert Louis Stevenson
    1916
    THE DONKEY, THE PACK, AND THE PACK - SADDLE IN a little place called Le 
Monastier, in a pleasant highland valley fifteen miles from Le Puy, I spent 
about a ...
    287 pages
    book
    Gmk-AAAAYAAJ
    HARVARD:HWP541
    CĂ©vennes Mountains (France)
    Travels with a donkey in the Cevennes
    An inland voyage
  
  
    http://www.google.com/books/feeds/volumes/f3A-AAAAYAAJ
    2010-08-30T20:37:27.000Z
    
    St. Ives
    
    
    
    
    
    
    
    
    
    
    Robert Louis Stevenson
    1906
    IVES CHAPTER IA TALE OF A LION RAMPANT IT was in the month of May,, that I was 
so unlucky as to fall at last into the hands of the enemy. ...
    528 pages
    book
    f3A-AAAAYAAJ
    HARVARD:HWP61W
    St. Ives
    being the adventures of a French prisoner in England
  
  
    http://www.google.com/books/feeds/volumes/4mb8LuKKwocC
    2010-08-30T20:37:27.000Z
    
    Cruising with Robert Louis Stevenson
    
    
    
    
    
    
    
    
    
    
    Oliver S. Buckton
    2007
    Cruising with Robert Louis Stevenson: Travel, Narrative, and the Colonial Body is the first book-length study about the influence of travel on Robert Louis ...
    344 pages
    book
    4mb8LuKKwocC
    ISBN:0821417568
    ISBN:9780821417560
    Ohio Univ Pr
    Literary Criticism
    Cruising with Robert Louis Stevenson
    travel, narrative, and the colonial body
  
  
    http://www.google.com/books/feeds/volumes/4yo9AAAAYAAJ
    2010-08-30T20:37:27.000Z
    
    New Arabian nights
    
    
    
    
    
    
    
    
    
    
    Robert Louis Stevenson
    1922
    THE SUICIDE CLUB STORY OF THE YOUNG MAN WITH THE CREAM TARTS DURING his 
residence in London, the accomplished Prince Florizel of Bohemia gained the ...
    386 pages
    book
    4yo9AAAAYAAJ
    HARVARD:HWP51H
    Fiction
    New Arabian nights
  
  
    http://www.google.com/books/feeds/volumes/z2Yf1FX02EkC
    2010-08-30T20:37:27.000Z
    
    Robert Louis Stevenson
    
    
    
    
    
    
    
    
    
    
    Richard Ambrosini
    Richard Dury
    2006
    As the editors point out in their Introduction, Stevenson reinvented the “personal essay” and the “walking tour essay,” in texts of ironic stylistic ...
    377 pages
    book
    z2Yf1FX02EkC
    ISBN:0299212246
    ISBN:9780299212247
    Univ of Wisconsin Pr
    Literary Criticism
    Robert Louis Stevenson
    writer of boundaries
  

Now it would be nice if the Atom included <link> elements for the epubs themselves. Perhaps the feed could even use the recently released “acquisition” link relation defined by OPDS v1.0. For example, by including something like the following in each atom:entry element:


Theoretically it should be possible to construct the appropriate link for the epub, based on what data is available in the Atom. But it would enable quite a bit of use of the epubs to make their URLs available explicitly in a programmatic way. Unfortunately we would still be limited to dipping into the full dataset using a query, instead of being able to crawl the entire archive, with something like a paged Atom feed. From a conversation over on get-theinfo it appears that this approach might not be as easy as it sounds. Also, it turns out that magically, many of the books have been uploaded to the Internet Archive. 902,188 of them in fact.

So maybe not that much work needs to be done. But presumably more public domain content will become available from Google Books, and it would be nice to be able to say there was at least one other copy of it elsewhere, for digital preservation purposes. It would be great to see Google step up and do some good, by making their API usable for folks wanting to replicate the public domain content. Still, at least they haven’t of done evil by locking it away completely. Dan Brickley had an interesting suggestion to possibly collaborate on this work.