www.muttznutz.net

Underwater photography by Andy Kirkland

Keyword Classes Pt.2 – Setting them up

In the first article, I talked about image metadata, and identified the shortfalls in terms of what I need them to do …

The third article demonstrates how I’ve used these keywords to drive the content of my site. (It might be worth taking a look at that article if you don’t understand what I’m going on about)

I’ve called my solution “Keyword Classes”, but they could equally be referred to as “Structured Keywords” or “Smart Keywords”.

This article talks about how I’ve made ’em work…

Disclaimer

To really use these classes to best effect, you’ll need to be able to parse the entries using a programming tool. If you don’t know how to code – or don’t know someone that does – then I’m afraid you probably won’t be able to get this done.

For those who want to read on anyway, I should explain the phrase “parse” – which (in programming terms) means splitting up a long string into its individual components, so that each can be processed separately.

So parsing a keyword string that looks like [dn]204;[fi]Humphead wrasse[spf]Cheilinus undulatus should give three entries:

[dn]204
[fi]Humphead wrasse
[spf]Cheilinus undulatus

This keyword string would be parsed by php as having two keywords – the semicolon in front of the [dn] would be the break point.

Delimiting Class Identifiers

So that the programs reading the keywords can know 1) that it is a class item and 2) what type of class item it is, I needed to set up an identifier system.

I figured the best way to do this is to start the keyword with a delimited string, containing a class identifier (stay with me – I’ll get to this in a minute). Separate start- and end-delimiter characters would work best, and they should ideally be characters which wouldn’t normally turn up in your keywords.

Just to complicate things a bit more, certain delimiters have specific meanings for HTML (the language that drives browsers). These are often changed by scripting languages – or may behave unpredictably – so quote characters are out, as are the GT/LT angle brackets (<>) Forward- and back-slashes are also out. If you want to geek out a bit, then you can check out the HTML Entities page on the W3Schools website for more details.

(For the techies : I did try to embed XML, but – apart from entity encoding – it got much too unwieldy trying to guarantee closing the entities).

Ideally, the delimiter at the start of the string would be different from that at the end. The most natural fit is, therefore, either the square brackets ([]) or curly parentheses ({}). The former doesn’t need a shift key, so I’ve gone with those.

Class Identifiers

Now we get to the heart of it. Each class of keyword needs a separate identifier. These can, actually, be of any length (but keeping it to two or three characters is probably more efficient). I like to use abbreviations that are easy to remember.

So for my underwater albums :

  • dn – is the Dive Number, a key into my logbook database
  • sp – is the binomen – or “latin name” of the species
  • spf – is like “sp“, but a bit more specific. These are species which can be referenced in the Fishbase database, so I want to link to it (that means different processing, so its a different class).
  • fi – is the “common name” of the fish, as listed in Fishbase.

So entries for my 204th dive will have the keyword
[dn]204

You can create a keyword  with more than one class – e.g.

[fi]Humphead wrasse[spf]Cheilinus undulatus
.. As all Humphead wrasse belong to the same species, there’s no point in having two separate Tag items – they’d only get out-of-step, and you’re giving yourself twice as much work.

(Just to clarify my terminology, a “Tag” – in PSE – may incorporate more than one “Keyword”)

Reusing classes

After a few years, I decided to show some topside wildlife images on some parts of my site. So I used the same keyword classes for bird, mammal and insect species as I did for Fish. I introduced a new class – [ln] to denote a non-dive location. A bit of recoding my scripts, and everything appeared in a consistent way.

Programming considerations

PSE (and – probably – other applications) won’t necessarily write the tags in any particular order. It may also insert its own delimiters between keywords.

So any software which parses these classes may need to

  • ignore semicolons which precede a prefix
  • deal with class instances occurring in any order
  • potentially deal with multiple instances of a class (more than one fish species in a photo)
  • potentially deal with the absence of a specific class within a specific image

Depending on the software tools you use to create the tags, and to parse them, you may need to ignore delimiters (such as semi-colons or commas) introduced by the software. These are becoming less common now.

If you’re using the PHP toolkit to generate your web pages, this will parse the individual keywords, which makes everything much easier.
Unstructured comments (or those relating to individual images) can be included in another field – such as the the “Caption” or “Title” fields .

The URI to TrackBack this entry is: https://muttznutz.net/muttzblog/post-processing/setting-them-up/trackback

RSS feed for comments on this post.


Valid XHTML 1.0 Strict