Messing around wih D3.js and hierarchical data


These days there are a lot of browser-oriented visualization toolkits, such d3.js or jit.js. They're great and easy to use, but how much do they scale when used with medium-large or very large datasets?

The subject ontology is a quite large (~2500 entities) taxonomical classification developed at Nature Publishing Group in order to classify scientific publications. The taxonomy is publicly available on data.nature.com, and is being encoded using the SKOS RDF vocabulary.

In order to evaluate the scalability of various javascript tree visualizations I extracted a JSON version of the subject taxonomy and tried to render it on a webpage, using out-of-the-box some of the viz approaches made available; here are the results (ps: I added the option of selecting how many levels of the tree can be visualized, just to get an idea of when a viz breaks).

Screen Shot 2014 02 13 at 2 07 50 PM

Some conclusions:

  • The subject taxonomy actually is a poly-hierarchy (=one term can have more than one parent, so really it's more like a directed graph). None of the libraries could handle that properly, but maybe that's not really a limitation cause they are meant to support the visualization of trees (maybe I should play around more with force-directed graphs layout and the like..)

  • The only viz that could handle all of the terms in the taxonomy is D3's collapsible tree. Still, you don't want to keep all the branches open at the same time! Click on the image to see it with your eyes.

CollapsibleTree

  • An approach to deal with large quantities of data is obviously to show them a little bit at a time. The Bar Hierarchy seems a pretty good way to do that, it's informative and responsive. However it'd be nice to integrate with other controls/visual cues that would tell one what level of depth they're currently looking at, which siblings are available etc.. etc..

BarHiearchy

  • Partition tables also looks pretty good in providing a visual summary of the categories available; however they tend to fail quickly when there are too many nodes, and the text is often not readable at all.. in the example below I had to include only the first 3 levels of the taxonomy for it to be loaded properly!

TreeMapD3

TreeMap

  • Rotating tree. Essentially a Tree plotted on a circle, very useful to provide a graphical overview of the data but it tends to become non responsive quickly.

RotatingTree

  • Hierarchical pie chart. A pie chart that allows zooming in so to reveal hierarchical relationships (often also called Zoomable Sunburst). Quite nice and responsive, also with a large amount of data. The absence of labels could be a limiting feature though; you get a nice overview of the datascape but can't really understand the meaning of each element unless you mouse over it.

PieTree

Other stuff out there that could do a better job?

Cite this blog post:


Michele Pasin. Messing around wih D3.js and hierarchical data. Blog post on www.michelepasin.org. Published on June 21, 2013.

Comments via Github:


See also:

2019


paper  Interlinking SciGraph and DBpedia datasets using Link Discovery and Named Entity Recognition Techniques

Second biennial conference on Language, Data and Knowledge (LDK 2019), Leipzig, Germany, May 2019.


2017



paper  Data integration and disintegration: Managing Springer Nature SciGraph with SHACL and OWL

Industry Track, International Semantic Web Conference (ISWC-17), Vienna, Austria, Oct 2017.



paper  Using Linked Open Data to Bootstrap a Knowledge Base of Classical Texts

WHiSe 2017 - 2nd Workshop on Humanities in the Semantic web (colocated with ISWC17), Vienna, Austria, Oct 2017.




2016



paper  Insights into Nature’s Data Publishing Portal

The Semantic Puzzle (online interview), Apr 2016.


2015


paper  Learning how to become a linked data publisher: the nature.com ontologies portal.

5th Workshop on Linked Science 2015, colocated with ISWC 2015., Bethlehem, USA, Sep 2015.


2013


paper  Moving EMLoT towards the web of data: an approach to the representation of humanities citations based on role theory and formal ontology

New Technologies in Medieval and Renaissance Studies, (forthcoming). (part of the 'Envisioning REED in the Digital Age' collection)




2010



paper  Data integration perspectives from the London Theatres Bibliography project

Annual Conference of the Canadian Society for Digital Humanities / Société pour l'étude des médias interactifs (SDH-SEMI 2010), Montreal, Canada, Jun 2010.


2009



paper  Laying the Conceptual Foundations for Data Integration in the Humanities

Proc. of the Digital Humanities Conference (DH09), Maryland, USA, Jun 2009. pp. 211-215