100,000 Books and FRBR

The news about 100,000 books on Freebase got me poking around with curl. I was pleased to see that Freebase actually distinguishes between a book as a work, and a particular edition of that book. To FRBR aficionados this will be familiar as the difference between a Work and a Manifestation:

For example here is a URI for James Joyce’s Dubliners as a work:

http://rdf.freebase.com/ns/en.dubliners

and here is a URI for a 1991 edition of Dubliners:

http://rdf.freebase.com/ns/guid.9202a8c04000641f80000000048ea5b4

If you follow those links in your browser you’ll most likely be redirected to the human readable html view. But machine agents can use the same URL to discover say an RDF representation of this edition of Dubliners, for example with curl:

curl --location --header "Accept: application/turtle" http://rdf.freebase.com/ns/guid.9202a8c04000641f80000000048ea5b4

@prefix fb: http://rdf.freebase.com/ns/.
@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#.
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema#.
@prefix xml: http://www.w3.org/XML/1998/namespace.

 <http://rdf.freebase.com/ns/guid.9202a8c04000641f80000000048ea5b4> a 
         <http://rdf.freebase.com/ns/book.book_edition>,
         <http://rdf.freebase.com/ns/common.topic>,
         <http://rdf.freebase.com/ns/media_common.creative_work>;
     <http://rdf.freebase.com/ns/book.book_edition.ISBN> "0486268705";
     <http://rdf.freebase.com/ns/book.book_edition.LCCN> "91008517";
     <http://rdf.freebase.com/ns/book.book_edition.author_editor> <http://rdf.freebase.com/ns/en.james_joyce>;
     <http://rdf.freebase.com/ns/book.book_edition.book> <http://rdf.freebase.com/ns/en.dubliners>;
     <http://rdf.freebase.com/ns/book.book_edition.dewey_decimal_number> "823";
     <http://rdf.freebase.com/ns/book.book_edition.number_of_pages> <http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000009a3be60>;
     <http://rdf.freebase.com/ns/book.book_edition.publication_date> "1991";
     <http://rdf.freebase.com/ns/type.object.name> "Dubliners";
     <http://rdf.freebase.com/ns/type.object.permission> <http://rdf.freebase.com/ns/boot.all_permission>. 

 <http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000009a3be60> a <http://rdf.freebase.com/ns/book.pagination>;
     <http://rdf.freebase.com/ns/book.pagination.numbered_pages> "152"^^<http://www.w3.org/2001/XMLSchema#int>;
     <http://rdf.freebase.com/ns/type.object.permission> <http://rdf.freebase.com/ns/boot.all_permission>.

There are a few assertions that struck me as interesting:

the statement in red that states that the resource is in fact an edition (of type http://rdf.freebase.com/ns/book.book_edition)
the statement in green which links the edition with the work (http://rdf.freebase.com/ns/en.dubliners).
and the assertion in blue which states the Library of Congress Control Number (LCCN) for the book

I was mostly surprised to see the library-centric metadata being collected such as LCCN, OCLC Number, Dewey Decimal Classification, LC Classification. There are even human readable instructions for how to enter the data (take that AACR2!).

Anyhow it got me wondering what it would be like to stuff all the Freebase book data into a triple store, assert:

<http://rdf.freebase.com/ns/book.book> <owl:sameAs> <http://purl.org/vocab/frbr/core#Work> .
<http://rdf.freebase.com/ns/book.book_edition> <owl:sameAs> <http://purl.org/vocab/frbr/core#Manifestation> .

and then run some basic inferencing and get some FRBR data. I know, crazy-talk … but it’s interesting in theory (to me at least).