Creating taxonomies

#Taxonomies #Tags #Categories #MetaData

Published on 13 October 2022 by Andrew Owen (4 minutes)

The hashtag has become ubiquitous. Chris Messina was inspired by the way chat rooms were identified on Internet Relay Chat (IRC) servers, when he first proposed its use in a 2007 tweet. Since then, it has spread across all social media and beyond. For American readers not already in the know, the pound sign ( # ) is referred to as the hash sign in British English.

Besides hashtags, you’ve probably encountered tags on blogs (like this one). Tags provide metadata about the content they are associated with. This can help to describe the content and make it easier to locate through search. By convention, the preferred number of hashtags in a social media post is four. And (at time of writing) you can’t edit posts on Twitter, so choosing the correct tags is important for content discovery.

In my days as a technical writer, I often tried to come up with a documentation equivalent of the security acronym CIA (confidentiality, integrity and availability). Hmmn, how about DRAUG (discoverability, relevance, atomicity, usability, generality). I may come back to that, but I digress (it’s becoming a habit).

When I publish a new article, I promote it on Instagram, LinkedIn, Mastodon and Twitter. If I remember to include tags, I think up something in the moment. If the article relates to a trending topic, then I can jump on that. But on my blog itself, I’d like for people to be able to find related content. That means putting a bit more thought into it.

And that’s where an information architecture comes in. Which is another way of saying: defining a taxonomy. Which often comes down to: defining a set of standardized tags. One reason I didn’t include tags on my articles back at the start of the year was because I hadn’t written many DevRel articles at that point. Now that I’ve published 40 articles, I have enough data to create a taxonomy.

The last time I created a taxonomy was when I was working on a documentation project in an XML-based CCMS. There I was able to tag content by type, subject, user and any other categories I could dream up. I found it very useful for document curation. If a new feature resulted in a change to the software behavior, it was very easy to find all the documentation that was affected.

On this blog I’m using Markdown and the tags are listed in the metadata. Hugo, the static-site generator that this site is built on, supports multiple taxonomies. But for now I’m confining myself to simple tags. There’s an index page where I can see a complete list of all the tags. From there, I can see if I’ve got any variants that are essentially the same tag.

I haven’t defined a complete taxonomy at this point, but I’ve made some decisions on tag conventions:

Use title case for proper nouns, like Apple.
Capitalizing acronyms, like API.
Hyphenate multi-word tags, like video-production.
Where there is a choice, prefer plurals, like data-lakes.
Use four tags for each article.

A quick look through my articles shows the most common tags I’ve used are:

This isn’t surprising given that I’m a former technical writer, a Mac user since 1993 and an FPGA retro hardware hobbyist. After all, the first rule of writing is: write what you know. Another rule of writing is: know your audience. Combined with analytics, the tags should help me to work out which topics are of most interest to my audience.

As more data emerges, I can refine my tags into categories (such as devops, docs and so on) and sub-categories (Git, CI/CD and so on). If I keep this up long enough I’ll end up with a taxonomy where my four tags cover the category, sub-category, the type of article and the topic. For example, #health #wellbeing #instruction #yoga (although not a likely combination in a DevRel blog).

Also worth noting that where capital letters are supported, you should use CamelCase so that screen-readers will work with the tags. Now I just have to fix the tags page.