Archive for the ‘web’ Category

How do 26 Nobel Laureates change a light bulb?

Friday, July 13th, 2007

I don’t know … but it sure is nice to see that 26 Nobel Laureates at least understand the direction libraries ought to be headed:

As scientists and Nobel laureates, we are writing to express our strong support for the House and Senate Appropriations Committees’ recent directives to the NIH to enact a mandatory policy that allows public access to published reports of work supported by the agency. We believe that the time is now for Congress to enact this enlightened policy to ensure that the results of research conducted by NIH can be more readily accessed, shared and built upon ­ to maximize the return on our collective investment in science and to further the public good.

The public at large also has a significant stake in seeing that this research (researched funded by the National Institute of Health) is made more widely available. When a woman goes online to find what treatment options are available to battle breast cancer, she will find many opinions, but peer-reviewed research of the highest quality often remains behind a high-fee barrier. Families seeking clinical trial updates for a loved one with Huntington’s disease search in vain because they do not have a journal subscription. Librarians, physicians, health care workers, students, journalists, and investigators at thousands of academic institutions and companies are currently hindered by unnecessary costs and delays in gaining access to publicly funded research results.

Exciting times for libraries and the medical profession! I just hope they can convince Congress.

purl2

Thursday, July 12th, 2007

It’s great to see that OCLC is going to work with Zepheira on a new version of the PURL service and that it’s going to have an Apache license. Other than addressing scalability issues it sounds like Zepheira is going to build in support for resources that are outside of the information space of the web:

The new PURL software will also be updated to reflect the current understanding of Web architecture as defined by the World Wide Web Consortium (W3C). This new software will provide the ability to permanently identify networked information resources, such as Web documents, as well as non-networked resources such as people, organizations, concepts and scientific data. This capability will represent an important step forward in the adoption of a machine-processable “Web of data” enabled by the Semantic Web.

Since Eric Miller helped start up Zepheira it’s not surprising that purl2 will take this on. As part of some experiments I’ve been doing with SKOS, and serving up Concepts over HTTP it has become clear that a minimal bit of work for managing these identifiers would be useful. I can definitely see the need for a general solution that helps manage identifiers for people, organizations, concepts, etc. which also fits into how HTTP should/could serve up the resources associated with them.

via Thom Hickey

Angela’s dilemma

Thursday, May 31st, 2007

If you are interested in practical ways to garden in the emerging web-of-data take a look at this draft finding that folks in the W3C Technical Architecture Group are considering. Or for a different expression of the same idea look at Cool URIs for the Semantic Web.

These two documents describe a simple use of HTTP and URLs to identify resources that are outside of the information space of the web. Yes, you read that right: resources that are outside the information space of the web. Why would I want to use URLs to address resources that aren’t on the web!? The finding illustrates this subtlety using Angela’s dilemma:

Angela is creating an OWL ontology that defines specific characteristics of devices used to access the Web. Some of these characteristics represent physical properties of the device, such as its length, width and weight. As a result, the ontology includes concepts such as unit of measure, and specific instances, such as meter and kilogram. Angela uses URIs to identify these concepts.Having chosen a URI for the concept of the meter, Angela faces the question of what should be returned if that URI is ever dereferenced. There is general advice that owners of URIs should provide representations [AWWW] and Angela is keen to comply. However, the choices of possible representations appear legion. Given that the URI is being used in the context of an OWL ontology, Angela first considers a representation that consists of some RDF triples that allow suitable computer systems to discover more information about the meter. She then worries that these might be less useful to a human user, who might prefer the appropriate Wikipedia entry. Perhaps, she reasons, a better approach would be to create a representation which itself contains a set of URIs to a range of resources that provide related representations. Perhaps content negotiation can help? She could return different representations based on the content type specified in the request.

Angela’s dilemma is, of course, based on the fact that none of the representations she is considering are actually representations of the units of measure themselves. Even if the Web could deliver a platinum-iridium bar with two marks a meter apart at zero degrees celsius, or 1,650,763.73 wavelengths of the orange-red emission line in the electromagnetic spectrum of the krypton-86 atom in a vacuum, or even two marks, a meter apart on a screen, such representations are probably less than completely useful in the context of an information space. The representations that Angela is considering are not representations of the meter itself. Instead, they are representations of information resources related to the meter.

It is not appropriate for any of the individual representations that Angela is considering to be returned by dereferencing the URI that identifies the concept of the meter. Not only do the representations she is considering fail to represent the concept of the meter, they each have a different essence and so they should each have their own URI. As a consequence, it would also be inappropriate to use content negotiation as a way to provide them as alternate representations when the URI for the concept of the meter is dereferenced.

So assuming we are agreed about the problem what’s the solution? Basically you can use content negotiation and a 303 See Other HTTP status code to redirect to the appropriate resource. For an example of the basic idea in action fire up curl and take a look at how this instance of the SemanticMediaWiki responds to a GET request:

%  curl --head http://ontoworld.org/wiki/Special:URIResolver/Ruby
HTTP/1.1 303 See Other
Date: Thu, 31 May 2007 20:03:12 GMT
Server: Apache/2.2.3 (Debian) ...
Location: http://ontoworld.org/wiki/Ruby
Content-Type: text/html; charset=UTF-8

Nothing too surprising there–basically just got redirected to another URL that serves up some friendly HTML describing the Ruby programming language. But send along an extra Accept header:

% curl --head  --header 'Accept: application/rdf+xml
http://ontoworld.org/wiki/Special:URIResolver/Ruby
HTTP/1.1 303 See Other
Date: Thu, 31 May 2007 20:04:36 GMT
Server: Apache/2.2.3 (Debian) ...
Location: http://ontoworld.org/wiki/Special:ExportRDF/Ruby
Content-Type: text/html; charset=UTF-8

Notice how you are redirected to another URL that results in rdf/xml describing Ruby coming down the pipe? RubyOnRails and other frameworks have good REST support built in for doing content negotiation to provide multiple representations of a single information resource. But the use of the 303 See Other here is a new subtle twist to accommodate the fact that the resource in question isn’t really a canonical set of bits on disk somewhere. The good news is that your browser will display the human readable resource when you visit http://ontoworld.org/wiki/Special:URIResolver/Ruby

Some folks would argue that resources that are outside the web don’t deserve URLs and should instead be identified with URIs like info-uris that are not required to resolve. My personal feeling is that info-uris do have a great deal of use in the enterprise (where they are most likely resolvable). But in situations like Angela’s where she is creating a public RDF document that needs to refer to concepts like “length” and “meter” I think it makes sense that these concepts should resolve to appropriate representations that will guide appropriate usage. Or as the Architecture of the World Wide Web puts it:

A URI owner may supply zero or more authoritative representations of the resource identified by that URI. There is a benefit to the community in providing representations. A URI owner SHOULD provide representations of the resource it identifies

It’ll be interesting to see how these issues shake out as more and more structured data is made available on the web.

miscellaneous talk

Thursday, May 17th, 2007

If you are reading Everything is Miscellaneous like me then you might be interested in watching a talk David Weinberger did a few days ago at Google.

I only wish I had more time to ingest all the good content that comes in through the GoogleTech Talks feed.

nekkid

Thursday, April 5th, 2007

Yeah, today is CSS Naked Day I just hope I remember to re-enable CSS tomorrow :-)

exhibit

Friday, February 16th, 2007

If you haven’t tried Exhibit out yet the simile folks have created a truly wonderful data publishing framework which runs entirely in your browser with a bit of javascript, html and css.

The remarkable part is that it requires no backend database, but simply operates on a stream of json. If you have a couple minutes take a look at their Getting Started Tutorial which shows you how to create a exhibit of MIT related nobel laureates with a tiny bit of HTML, CSS and JavaScript.

Just as an experiment I tried pointing it at my delicious json feed for metadata. It turns out that exhibit wants json data to be a hash with a key ‘items’ that points to a list of items. In addition it also wants each item to have a ‘label’ key. I quickly reformatted the delicious json with simplejson, and got this.

A few minutes later I prodded the simile folks to see if there is a way of filtering json data on the way into exhibit so that it can be normalized…time passes (like maybe an hour) and then I hear from Johan Sundström that the latest/greatest exhibit code has this sort of filtering built in!

Tangential to the exhibit code, there has been an interesting discussion recently about how to expose exhibit content to indexing services like google. Since exhibit content is generated with pure javascript, and google (as far as we know) primarily indexes html content–the exhibit content is rendered invisible. This is a problem that digital library applications and repositories have to deal with as well, so it may be of interest.

uri-templates

Monday, February 5th, 2007

I’ve been playing with uri-templates a little bit at $work to help formulate clean urls for a newspaper application. The goal is to provide urls such as:

  • http://example.gov/issn/0362-4331
  • http://example.gov/issn/0362-4331/1969-05-28
  • http://example.gov/issn/0362-4331/1969-05-28/1
  • http://example.gov/issn/0362-4331/1969-05-28/1/31

I was hoping something like this would work:

  • http://example.gov/issn/{issn}/{date}/{edition}/{page}

But I’d like to indicate that the date, edition and page parameters are optional. After reading the spec and some discussion it becomes clear that there is no way to indicate that part of the path is optional. OpenSearch addresses the issue to some extent by making parameters optional with ‘?’:

  • http://example.gov/issn/{issn}/{date?}/{edition?}/{page?}

Which seems to be what I want. But there are some wrinkles such as when a page is included without a date. But perhaps these details could be application specific?

The discussion seemed to indicate that the template could be bundled with a written description of how the parameters are to be used. Or instead an additional template specification for optionality could be created which references the URI Template spec. There were also some nods towards WADL, which apparently has some richer conventions for this sort of thing.

I guess for the moment using

  • http://example.gov/issn/{issn}/{date}/{edition}/{page}

with some descriptive text will work good enough. But I think it would be useful if the uri-template draft commented on the issue somehow…since it’s bound to come up again.

oxford dictionary of national biography

Monday, February 5th, 2007

It’s interesting to see that the Oxford Dictionary of National Biography has created Cool URIs for their index of notable people. So for example if you want an identifier for JRR Tolkien you can use:

http://www.oxforddnb.com/index/101031766

Alas, the full content of the biography isn’t available (unless you subscribe), but I guess some publishers still have business models to hold on to. To see all the entries you have to browse them.

I think it’s a nice simple example of how authority files can be integrated into the web as we know it. Thanks to Caroline Arms for forwarding this on to me…

identifiers and authority records

Saturday, January 6th, 2007

Authority files are rather important for unambiguously talking about a person, place or thing. In database lingo they essentially amount to a primary key for a table. Given the time and effort libraries spend in maintaining authority records and assigning control numbers to individuals it makes sense that a URI could be assigned to an individual in such authority files. I realize this idea is nothing new, but until recently I hadn’t seen it put into practice particularly well.

I imagine this has been there all along but I just noticed that OCLC’s Linked Authority File includes PURLs for authors now. For example the following URL contains a LCCN:

http://errol.oclc.org/laf/n79-7035

When you GET this your browser is automatically redirected with an HTTP 302 to:

http://alcme.oclc.org/laf/servlet/OAIHandler?
verb=GetRecord&metadataPrefix=oai_dc&identifier=n79-7035

which you’ll notice is a OAI-PMH request to fetch a DublinCore record with the identifier n79-7035:

<oai_dc:dc
  xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
    http://openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    Borges, Jorge Luis,--1899-
  </dc:creator>
  <dc:description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    SuaÌrezLynch, B.--nnnc
  </dc:description>
</oai_dc:dc>

So now we know who this identifier is for, and the established heading for the individual. But it gets better (or worse depending on your perspective). Since this is an OAI-PMH server you can issue a ListMetadataFormats request to see what other flavors this record might be available in. If you do you’ll find out that this record is also available as marcxml in all its unholy glory (if you follow that link your browser will use a stylesheet to turn the raw xml into something a bit more presentable). Putting aside my snideness about MARC for a moment, this is a lot of useful data being made available.

You can also search the name authority file and get relevant PURLs via a SOAP/REST service. For example the irc bot panizzi in #code4lib actually has a bit of logic that allows it do lookups in the linked authority file:

06:56 < edsu> @naf borges, jorge
06:56 < panizzi> edsu: [20 matches] [~1] Borges, Jorge Luis, 1899-
                 <http://errol.oclc.org/laf/n79-7035>; [~2] Macedo, Jorge
                 Borges de. <http://errol.oclc.org/laf/n82-149895>; [~3]
                 Borges, Jorge G. (Jorge Guillermo), 1874-1938
                 <http://errol.oclc.org/laf/n90-681877>; [~4] Sua?rez Lynch, B.
                 <http://errol.oclc.org/laf/n82-21644>; [~5] Borges, Jorge
                 Wheliton Miranda <http://errol.oclc.org/laf/n92-76758>; [~6]
                 Canido Borges, Jorge Oscar (3 more messages)

All in all it’s an impressive mix of technology, standards and practice. It is not entirely clear to me how this work relates to the Virtual International Authority File. Perhaps LAF wasn’t considered a good acronym? If you are interested in such things Thom Hickey had a really interesting talk at Access2006 which has audio available.

the beeb and file sharing

Friday, December 22nd, 2006

Recognizing and leveraging the benefits of protocols like bittorrent in “legitimate” media distribution seems like a huge step forward. I guess I have to admit I’m also pretty excited about the prospect of .torrent files for Red Dwarf and Doctor Who episodes. But, still there will be some kind of digital-rights-management built in, so it won’t be totally open.