taxonomy – Parerga und Paralipomena

How to visualize a big taxonomy within a single webpage?

mikele — Fri, 22 Aug 2014 21:16:01 +0000

Here’s a couple more experiments aimed at representing visually a large taxonomy.

Some time ago I looked at ways to visualise a medium-large taxonomy (3000 terms circa) using one of the many visualisation kits out there. It turned out that pretty much all of them can’t handle that many terms, but there are other strategies that do come handy for that e.g. hide/reveal terms in the taxonomy based on what level you are looking at.

Why can’t I see the whole damn thing in one single page? Because there are too many things to display – you’d think.

So, step 1.

Here’s the entire set of elements on a page (well sort of).

Can’t we do better than that, though?

At the end of the day, if you assume a (quite modest these days) resolution of 800×600 pixels, you should be able to fit more than 300 9point characters in there (assuming 9 points equal 12 pixels).

Step 2.

Here’s another way: a font-size: 7px; and IDs instead of taxon’s labels make the visualisation much more compact.

And it does fit in a single window – hurray!

One problem though. This is not very useful with all those meaningless numbers.

Step 3.

So I tried to reduce the size a bit more so to fit the entire taxon label in there.

Also, adding a bit of interactivity so to reveal the hierarchy. The simple mechanism is this: when you click on an element of the taxonomy all of its ancestors get highlighted too. Just to remember this is not a plain list of things, but a tree.

Kind of like this one :-)

Possible next steps:

a) adding arrows to make the hierarchical relationships more evident
b) some sort of summary below the subject term in focus
c) sorting the terms by hierarchy-level rather than alphabetical order (will it make the taxonomy more intelligible?)

..to be continued..

Creating useful classifications with taxonomies (part 1)

mikele — Thu, 25 Jul 2013 08:17:02 +0000

Taxonomies and other classification schemes are omnipresent in Information Architecture. In this post I’ve tried to gather a few ideas on the topic, with the aim of clarifying the issue a little, and maybe help constructing more useful taxonomies. Comments and suggestions are welcome as usual!

It recently occurred to me though that there is a great deal of confusion with regards to what a taxonomy is, and how it should be designed, constructed, and managed. Often this is simply because people have different backgrounds and intents when dealing with taxonomies, so they end up overseeing a great deal of scientific work that already exists on this area.

What are taxonomies?

Let’s start by looking at a simple taxonomy. Here’s one that I could use in order to sort out the junk I have accumulated in my backyard:

- hardware
--- pc tower cases [2]
--- pc accessories  [4]
- toys
--- construction toys 
------ meccano [1]
--- dolls [3]
- kitchen stuff
--- plates [10]
--- old cutleries [14]

So what is a taxonomy? In general the aim of a taxonomy is to organise things into groups, according to some perceived similarities (e.g. structure, role, behaviour, purpose etc.). Not surprisingly this is what the Greek root ‘taxis’ means: putting things into order.

If you want a fancier definition, a taxonomy can be defined as a conceptual tool for classification. It’s a way to bring order to a domain of interest that can be composed by objects of any kind (e.g. physical or abstract, real or invented). A taxonomy normally plays the same role of an inventory or a list, for it describes what kind of things are available in a certain context and thus lets us carry out some task more efficiently within that context. For example, finding objects of interest, or comparing objects with similar characteristics.

I just said that a taxonomy is similar to a list or inventory of things; actually, that’s not correct. A taxonomy is much more than a list of concepts, in fact its key feature is that is organizes the concepts within a hierarchical structure. This is called the taxonomical tree.

root node
-- sub-concept-1
-- -- sub-concept 1-1 (leaf node)
-- sub concept 2
-- -- sub-concept 2-1 (leaf node)

The taxonomical tree is composed by nodes and links. In particular, the links are very important here as they offer a (more or less) explicit definition of the relationships among the categories that describe your ‘stuff’. In other words, a taxonomy acts a little bit like a map: it tells you what kind of things exists, and also how they can be meaningfully organized into a coherent framework.

So for example in biology we could have a taxonomy that organizes cell entities based on a (spatial) whole-part relationship:

Cell (Eukaryotic)
--Membrane
--Cytoplasm
----Mitochondria
----Nucleus
------Chromosomes
------Nucleolus

Consider now the case of a music magazine; here it might be more appropriate to construct a taxonomy based on a (thing-kind) sub-genre relationship:

rock
--blues rock
--hard rock
--heavy metal
---- speed metal
---- progressive metal

Finally, we can think of a mountaineering club that keeps an organized list of the instances of expeditions done by its members, by means of an instance-of relationship (or ‘example’):

Mountain
--Mount Everest
--Mount Kilimanjaro
Canyon
--Samaria Gorge
--Grand Canyon

So, in general, there can be many variations of ‘taxonomical maps’: spatial maps, thing-kind ones, thing-example ones etc.. And here’s the good news: the key to understanding how taxonomies work (and hence how to design them successfully) is to be able to identify and evaluate the implications of these variations.

The taxonomical relationship

I think it’s clear by now that the rationale for the hierarchical structure used by a taxonomy is not always entirely transparent. The meaning of the links that makes up the main taxonomical tree (the taxonomical relationship) is somehow left implicit. In fact, unless we have some accompanying documentation that defines the intended meaning of the relationship between one node and its parent(s) and children, it is up to us to interpret its sense.

This is often not a problem. If you look at the examples above, it is likely that you’d immediately understand what the taxonomical relationship stands for: e.g. part-of, type-of, broader-topic-than, instance-of etc..

However if your taxonomy has been growing over time, the situation could be rather different. An increasing number of relationships may have been used to construct a single tree, making it difficult for new users to make sense of the taxonomy, or for expert ones to update it without generating conflicts.

It is good practice then (especially if the taxonomy aims at being reused) to use a single relationship consistently throughout the taxonomical tree; also, to identify explicitly what the meaning of the taxonomical link is. As we will see in the next part of this post, this will make your work much more extendable and reusable.

Messing around wih D3.js and hierarchical data

mikele — Fri, 21 Jun 2013 13:23:59 +0000

These days there are a lot of browser-oriented visualization toolkits, such d3.js or jit.js. They’re great and easy to use, but how much do they scale when used with medium-large or very large datasets?

The subject ontology is a quite large (~2500 entities) taxonomical classification developed at Nature Publishing Group in order to classify scientific publications. The taxonomy is publicly available on data.nature.com, and is being encoded using the SKOS RDF vocabulary.

In order to evaluate the scalability of various javascript tree visualizations I extracted a JSON version of the subject taxonomy and tried to render it on a webpage, using out-of-the-box some of the viz approaches made available; here are the results (ps: I added the option of selecting how many levels of the tree can be visualized, just to get an idea of when a viz breaks).

Some conclusions:

The subject taxonomy actually is a poly-hierarchy (=one term can have more than one parent, so really it’s more like a directed graph). None of the libraries could handle that properly, but maybe that’s not really a limitation cause they are meant to support the visualization of trees (maybe I should play around more with force-directed graphs layout and the like..)

The only viz that could handle all of the terms in the taxonomy is D3’s collapsible tree. Still, you don’t want to keep all the branches open at the same time! Click on the image to see it with your eyes.

An approach to deal with large quantities of data is obviously to show them a little bit at a time. The Bar Hierarchy seems a pretty good way to do that, it’s informative and responsive. However it’d be nice to integrate with other controls/visual cues that would tell one what level of depth they’re currently looking at, which siblings are available etc.. etc..

Partition tables also looks pretty good in providing a visual summary of the categories available; however they tend to fail quickly when there are too many nodes, and the text is often not readable at all.. in the example below I had to include only the first 3 levels of the taxonomy for it to be loaded properly!

Rotating tree. Essentially a Tree plotted on a circle, very useful to provide a graphical overview of the data but it tends to become non responsive quickly.

Hierarchical pie chart. A pie chart that allows zooming in so to reveal hierarchical relationships (often also called Zoomable Sunburst). Quite nice and responsive, also with a large amount of data. The absence of labels could be a limiting feature though; you get a nice overview of the datascape but can’t really understand the meaning of each element unless you mouse over it.

Other stuff out there that could do a better job?