Underwater photography by Andy Kirkland

Keyword Classes Pt.1 – Background

(I first wrote up about Keyword classes in 2007. We’re working on this as part of my College course, so I’ve rewritten these pieces in May 2011 to reflect changes in technology.)

Introduction to Metadata

Well, most digital images have the ability to store “metadata” – information about the picture. A lot of this information is automatically recorded by the camera – details of the exposure, and the date and time the photo was taken. This information is also often referred to as “EXIF” data (although (to be pendantic) that’s only part of the story).

Some of the metadata is expected to be stored in a specific format, while other fields are free-format. These fields – such as the caption – will normally be input by the photographer. Increasingly, metadata can also be populated from other sources, such as GPS units. There’s more about metadata in the Wikipedia EXIF page.

Up to a couple of years ago, metadata was patchily supported in software – many applications just “dropped” it. Most current software now recognises it, although (frustratingly) it can be stripped out when uploading to (for example) social networking sites or news sites.

These fields recently came to prominence as it can contain information about the copyright and licence.

Standard metadata fields are now generally accessible in most image processing software – particularly Adobe’s software, but also Google’s Picasa, and webservices such as FlickR. Crucially, there are now libraries available for scripting tools such as PHP and Java – which mean that web developers can easily integrate this information.

Also crucially, the metadata is supported by – and can be carried across when converting between – most image formats (JPG, TIFF, DNG, PSD etc).

One of the main free-format fields is called “Keywords” and is actually stored in the IPTC section.

The Keywords metadata field

Originally an unstructured text string (with keywords delimited by semi-colons or commas), individual keywords are now often stored as array elements – a far more flexible and professional approach. Software can parse the array to break it into individual keywords, which can then be dealt with individually.

This is really useful for image processing software (and more so for asset management / cataloguing software such as Aperture, Lightroom and Picasa), as it allows software designers to build in “filtering” allowing selection of image sets based on individual keyword selection.

There are additional uses of the keywords – they can be “externalised” by web services, and used to tag the image, or to drive the HTML META keywords on a web page – making the page more relevant for search engines. This can actually be very relevant – Google Image search doesn’t look at metadata – so you can write a script to output keywords into your Google Images sitemap (although at the time of writing, that doesn’t work too well either !).

That’s great. So what’s your problem ?

This is all very well, but I wanted to more. The problem is that all keywords are treated as free-form text only. They have no meaning – a computer program can’t distinguish, for example, between the subject’s name and the colour of their clothes.

I wanted to treat certain types of keyword in specific ways. My image might contain a number of different keyword types. These are going to vary from one photographer to another, and – potentially – from one subject matter to another (I want to use different keywords for underwater and topside images).

What I wanted to do was to “tag” all of my (underwater) photos with details of the dive. I’d already got these details in my logbook software, and it seemed easiest to get all of that data from the database, rather than rekey it into every image.

I also wanted to identify the different species of fish (and there may be more than one) in an image, and I wanted to output these on the web pages in different positions and formats. I wanted to show both the scientific binomen (aka “Latin”) name, and the common name.
So, straight away, we’ve identified three types of keyword – a reference to the dive, and the common and Latin names of the subject. These are the different “classes” of keyword.

I wanted to use these different classes to behave in different ways on my website..

How did you sort it out ?

We’re going back several years now, and I was using Adobe’s Photoshop Elements (“PSE”) for photo management at the time, and this lets you set up custom XMP fields (excellent) – but only for files with an xmp sidecar file (booooo). So these could could work for RAW and PSD files, but were really tricky for JPGs. Lots of software wouldn’t even look for them.

With PSE Organiser “Tags” (as they were called at the time) you could just drag-and-drop, and you can re-use existing ones. You could even import/export between different PSE catalogs. And as the export file is in XML format, this can be processed separately as well, if you want.

But – fortunately perhaps – these actually used the IPTC Keywords metadata field. So they were accessible from JAlbum, but can also be read by PixVue, IrfanView, Exifer, Picasa, etc. etc., so … Keywords looked like the way to go. (At the time, I was using JAlbum to generate my web pages).

And when Adobe came out with Lightroom in 2007 – with metadata filtering, and using the same field – then it all fell into place. (Lightroom holds metadata in a catalogue – dng files need to be manually updated after changes)

I’ve now got the full version of Photoshop. These classes, of course, work in Bridge as well.

Workflow and Software Packages

Well, when I first developed all of this stuff, I was using PSE to organise and tag my photos. I then used JAlbum to generate the web albums – I’d modified a JAlbum skin to let me read the logbook database (MySQL), which was held locally. (The main reason for moving from JAlbum was just because I needed to reverse engineer my changes into the latest production version every time I upgraded).

Since then, I’ve moved from PSE to Lightroom (I very rarely need anything else to process my photos now). The tags still work. I’ve re-engineered the web presentation to php scripts – first of all as individual scripts (one per image), then using XML data as input to a generic script. A copy of the logbook database now resides on the web server, so I can look up the dive record dynamically.

Oh – and I’ve shifted my desktop from Windows to the Mac platform.

And each step of the way, these classes have let me replace individual tools without breaking the rest of the chain. And more opportunities to use them keep popping up (there’s something else really cool on the way, but more about that later …)

The next article gets to how to set up Keyword Classes, and the third article shows how I’ve used them in my website (you may want to jump through to that one first before you get into the technical stuff..)

The URI to TrackBack this entry is: https://muttznutz.net/muttzblog/post-processing/introduction-to-keywords-classes/trackback

RSS feed for comments on this post.

Valid XHTML 1.0 Strict