freebase and linked-data

Ok, this is pretty big news for linked data folks, and for semweb-heads in general. Freebase is now a linked-data target. This is important news because Freebase is an active community of content creators, creating rich data-centric descriptions with a wiki style interface, fancy data loaders, and useful machine APIs.

The web2.0-meets-semweb space is also being explored by folks like Talis. It’ll be interesting to see how this plays out–particularly in light of SPARQL adoption, which I remain kind of neutral about for some undefined, wary, spooky reason. I get the idea of web resources having data views. It seems like a logical, “one small step for an web agent, one giant leap for the web”. But queryability with SPARQL sounds like something to push off, particularly if you’ve already got a search api that could be hooked up to the data views.

At any rate, what this announcement means is that you can get machine readable data back from freebase using a URI. The descriptions then use more URIs, which you can then follow-your-nose to, and get more machine readable data. So if you are on a page like:

http://www.freebase.com/view/en/tim_berners-lee

you can construct a URL for Tim Berners-Lee like this:

http://rdf.freebase.com/ns/en.tim_berners-lee

Then you resolve that URL asking for application/turtle (you could ask for application/rdf+xml but I find the turtle more readable).

curl --location --header "Accept: application/turtle" http://rdf.freebase.com/ns/en.tim_berners-lee

And you’ll get back a description like this. There’s a lot of useful data there, but the interesting part for me is the follow-your-nose effect where you can see an assertion like:

 <http://rdf.freebase.com/ns/en.tim_berners-lee>   
     <http://rdf.freebase.com/ns/influence.influence_node.influenced_by>
     <http://rdf.freebase.com/ns/en.ted_nelson> .

And you can then go look up Ted Nelson using that URI:

  curl --location --header "Accept: application/turtle" http://rdf.freebase.com/ns/en.ted_nelson

And get another chunk of data which includes this assertion:

 <http://rdf.freebase.com/ns/en.ted_nelson>
     <http://rdf.freebase.com/ns/influence.influence_node.influenced_by>
     <http://rdf.freebase.com/ns/en.vannevar_bush> .

And you can then continue following your nose to:

http://rdf.freebase.com/ns/en.vannevar_bush

Lather, rinse, repeat.

So why is this important? Because following your nose in HTML is what enabled companies like Lycos, AltaVista, Yahoo and Google to be born. It allowed for agents to be able to crawl the web of documents and build indexes of the data to allow people to find what they want (hopefully). Being able to link data in this way allows us to harvest data assets across organizational boundaries and merge them together. It’s early days still, but seeing an organization like Freebase get it is pretty exciting.

Oh, there are a few little rough spots which probably should be ironed out … but when is that ever not the case eh? Inspiring stuff.