lots of copies keeps epubs safe
Over the weekend you probably saw the announcements going around about Google Books releasing +1 million public domain ebooks on the web as epubs. This is great news: epub is a web friendly, open format â and having all this content available as epub is important.
Now I might be greedy, but when I saw that 1 million epubs are available my mind immediately jumps to thinking of getting them, indexing them and whatnot. Then I guiltily justified my greedy thoughts by pondering the conventional digital preservation wisdom that Lots of Copies Keeps Stuff Safe (LOCKSS). The books are in the public domain, so âŠ. why not?
Google Books has a really nice API, which lets you get back search results as Atom, with lots of links to things like thumbnails, annotations, item views, etc. You also get a nice amount of Dublin Core metadata. And you can limit your search to books published before 1923. For example hereâs a search for pre-1923 books that mention âStevensonâ (disclaimer: I donât think the 1923 limit is actually working):
curl 'http://books.google.com/books/feeds/volumes?tbs=cd_max:Jan%2001_2%201923&q=Stevenson' | xmllint --format -
which yields:
< ?xml version="1.0" encoding="UTF-8"?>
http://www.google.com/books/feeds/volumes
2010-08-30T20:37:27.000Z
Search results for Stevenson
Google Books Search
http://www.google.com
Google Book Search data API
206
1
10
http://www.google.com/books/feeds/volumes/ENMWAAAAYAAJ
2010-08-30T20:37:27.000Z
Kidnapped
Robert Louis Stevenson
1909
308 pages
book
ENMWAAAAYAAJ
HARVARD:HN1JZ9
Kidnapped
being memoirs of the adventures of David Balfour in the year 1851 ...
http://www.google.com/books/feeds/volumes/WZ0vAAAAMAAJ
2010-08-30T20:37:27.000Z
Treasure Island
Robert Louis Stevenson
George Edmund Varian
1918
CHAPTER I THE OLD SEA DOG AT THE "ADMIRAL BENBOW" SQUIRE Trelawney, Dr. Livesey,
and the rest of these gentlemen having asked me to write down the whole ...
306 pages
book
WZ0vAAAAMAAJ
NYPL:33433075793830
Fiction
Treasure Island
http://www.google.com/books/feeds/volumes/REUrAQAAIAAJ
2010-08-30T20:37:27.000Z
Stevenson
Adlai Ewing Stevenson
Grace Darling
David Darling
1977-10
127 pages
book
REUrAQAAIAAJ
STANFORD:36105037014342
McGraw-Hill/Contemporary
Biography & Autobiography
Stevenson
http://www.google.com/books/feeds/volumes/3ibdGgAACAAJ
2010-08-30T20:37:27.000Z
Stevenson
Robert Louis Stevenson
2007-01-17
This scarce antiquarian book is included in our special Legacy Reprint Series.
128 pages
book
3ibdGgAACAAJ
ISBN:1430495375
ISBN:9781430495376
Kessinger Pub Co
Poetry
Stevenson
Day by Day
http://www.google.com/books/feeds/volumes/3QI-AAAAYAAJ
2010-08-30T20:37:27.000Z
A child's garden of verses
Robert Louis Stevenson
1914
IN winter I get up at night And dress by yellow candle-light. In summer, quite
the other way, I have to go to bed by day. I have to go to bed and see The ...
136 pages
book
3QI-AAAAYAAJ
CORNELL:31924052752262
Children's poetry, Scottish
A child's garden of verses
by Robert Louis Stevenson; illustrated by Charles Robinson
http://www.google.com/books/feeds/volumes/Gmk-AAAAYAAJ
2010-08-30T20:37:27.000Z
Travels with a donkey in the Cevennes
Robert Louis Stevenson
1916
THE DONKEY, THE PACK, AND THE PACK - SADDLE IN a little place called Le
Monastier, in a pleasant highland valley fifteen miles from Le Puy, I spent
about a ...
287 pages
book
Gmk-AAAAYAAJ
HARVARD:HWP541
CĂ©vennes Mountains (France)
Travels with a donkey in the Cevennes
An inland voyage
http://www.google.com/books/feeds/volumes/f3A-AAAAYAAJ
2010-08-30T20:37:27.000Z
St. Ives
Robert Louis Stevenson
1906
IVES CHAPTER IA TALE OF A LION RAMPANT IT was in the month of May,, that I was
so unlucky as to fall at last into the hands of the enemy. ...
528 pages
book
f3A-AAAAYAAJ
HARVARD:HWP61W
St. Ives
being the adventures of a French prisoner in England
http://www.google.com/books/feeds/volumes/4mb8LuKKwocC
2010-08-30T20:37:27.000Z
Cruising with Robert Louis Stevenson
Oliver S. Buckton
2007
Cruising with Robert Louis Stevenson: Travel, Narrative, and the Colonial Body is the first book-length study about the influence of travel on Robert Louis ...
344 pages
book
4mb8LuKKwocC
ISBN:0821417568
ISBN:9780821417560
Ohio Univ Pr
Literary Criticism
Cruising with Robert Louis Stevenson
travel, narrative, and the colonial body
http://www.google.com/books/feeds/volumes/4yo9AAAAYAAJ
2010-08-30T20:37:27.000Z
New Arabian nights
Robert Louis Stevenson
1922
THE SUICIDE CLUB STORY OF THE YOUNG MAN WITH THE CREAM TARTS DURING his
residence in London, the accomplished Prince Florizel of Bohemia gained the ...
386 pages
book
4yo9AAAAYAAJ
HARVARD:HWP51H
Fiction
New Arabian nights
http://www.google.com/books/feeds/volumes/z2Yf1FX02EkC
2010-08-30T20:37:27.000Z
Robert Louis Stevenson
Richard Ambrosini
Richard Dury
2006
As the editors point out in their Introduction, Stevenson reinvented the âpersonal essayâ and the âwalking tour essay,â in texts of ironic stylistic ...
377 pages
book
z2Yf1FX02EkC
ISBN:0299212246
ISBN:9780299212247
Univ of Wisconsin Pr
Literary Criticism
Robert Louis Stevenson
writer of boundaries
Now it would be nice if the Atom included <link> elements for the
epubs themselves. Perhaps the feed could even use the
recently
released âacquisitionâ link relation defined by OPDS v1.0. For
example, by including something like the following in each
atom:entry
element:
Theoretically it should be possible to construct the appropriate link for the epub, based on what data is available in the Atom. But it would enable quite a bit of use of the epubs to make their URLs available explicitly in a programmatic way. Unfortunately we would still be limited to dipping into the full dataset using a query, instead of being able to crawl the entire archive, with something like a paged Atom feed. From a conversation over on get-theinfo it appears that this approach might not be as easy as it sounds. Also, it turns out that magically, many of the books have been uploaded to the Internet Archive. 902,188 of them in fact.
So maybe not that much work needs to be done. But presumably more public domain content will become available from Google Books, and it would be nice to be able to say there was at least one other copy of it elsewhere, for digital preservation purposes. It would be great to see Google step up and do some good, by making their API usable for folks wanting to replicate the public domain content. Still, at least they havenât of done evil by locking it away completely. Dan Brickley had an interesting suggestion to possibly collaborate on this work.