metadata from Getty’s Open Content Program (part 2)

A few weeks ago I wrote a brief post about the embedded metadata found in images from the (awesome) Getty Open Content Program. This led to a useful exchange with Brenda Podemski on Twitter, which she gathered together with Storify. I promised her I would write another blog post that showed how the metadata could be expressed a little bit better.

It’s hard to read RDF as XML and Turtle isn’t for everyone, so here’s a picture of part of the XMP RDF metadata that is included in the highres download for a photo by Eugène Atget of a sculpture Bosquet de l’Arc de Triomphe by Jean-Baptiste Tuby. I haven’t portrayed everything in the file since it would clutter up the point I’m trying to make.

Original Description

Depicted here are two resources described in the RDF, the JPEG file itself and what the IPTC vocabulary calls an Artwork or Object. Now, it is good that the description distinguishes between the file and the photograph. The Dublin Core somewhat famously (in metadata circles) call this the One-To-One Principle. But notice how there is a dc:description attached to the file resource with lots of useful information concatenated together as a string? My question to Brenda was whether that string was actually available as structured data, and could it be expressed differently? Her response seemed to indicate that it was.

My suggestion is to unpack and move that concatenated string to describe the photograph, like so:

Unpacked description

Notice how the dimensions, format, type and were broken out into separate assertions about the photograph? I also quickly modified the description to use the Dublin Core vocabulary since it was more familiar to me. I wasn’t able to quickly find good properties for height and width, but I imagine they are out there somewhere, and if not there could be.

Of course, one could go further, and say there are really three resources: the file, the photograph, and the sculpture.

Added Sculpture

But this could be extra work for the Getty, if they don’t have this level of description yet. The half-step of enriching the description by indicating that it is a photograph of particular dimensions in a particular format seems like a useful thing to do for this example though, especially if they have that structured data already. My particular vocabulary choices (dc, foaf, etc) aren’t important compared to hanging the descriptions off of the right resources.

But, and this is a doozy of a but, it looks like from other metadata in the RDF that the metadata is being input with Photoshop. So while it is technically possible to embed this metadata in XMP as RDF, it is quite likely that Photoshop doesn’t give you the ability to enter it. In fact, it is fairly common for some image processing applications to strip parts or all of the embedded metadata. So to embed these richer descriptions into the files one might need to write a small program to do it.

There is another place where the metadata could be embedded though. What if the webpage for the item had embedded RDFa or Microdata in it that expressed this information? If they could commit to a stable URL for the item it would be a perfect place for both the human and machine readable description. All they would have to do would be to link the XMP metadata to it somehow, and adjust the templates they are using that drive the HTML display.