tree – Parerga und Paralipomena http://www.michelepasin.org/blog At the core of all well-founded belief lies belief that is unfounded - Wittgenstein Thu, 17 Jul 2014 12:59:52 +0000 en-US hourly 1 https://wordpress.org/?v=5.2.11 13825966 Messing around wih D3.js and hierarchical data http://www.michelepasin.org/blog/2013/06/21/messing-around-wih-d3-js-and-hierarchical-data/ http://www.michelepasin.org/blog/2013/06/21/messing-around-wih-d3-js-and-hierarchical-data/#comments Fri, 21 Jun 2013 13:23:59 +0000 http://www.michelepasin.org/blog/?p=2379 These days there are a lot of browser-oriented visualization toolkits, such d3.js or jit.js. They’re great and easy to use, but how much do they scale when used with medium-large or very large datasets?

The subject ontology is a quite large (~2500 entities) taxonomical classification developed at Nature Publishing Group in order to classify scientific publications. The taxonomy is publicly available on data.nature.com, and is being encoded using the SKOS RDF vocabulary.

In order to evaluate the scalability of various javascript tree visualizations I extracted a JSON version of the subject taxonomy and tried to render it on a webpage, using out-of-the-box some of the viz approaches made available; here are the results (ps: I added the option of selecting how many levels of the tree can be visualized, just to get an idea of when a viz breaks).

Screen Shot 2014 02 13 at 2 07 50 PM

Some conclusions:

  • The subject taxonomy actually is a poly-hierarchy (=one term can have more than one parent, so really it’s more like a directed graph). None of the libraries could handle that properly, but maybe that’s not really a limitation cause they are meant to support the visualization of trees (maybe I should play around more with force-directed graphs layout and the like..)
  • The only viz that could handle all of the terms in the taxonomy is D3’s collapsible tree. Still, you don’t want to keep all the branches open at the same time! Click on the image to see it with your eyes.
  • CollapsibleTree

  • An approach to deal with large quantities of data is obviously to show them a little bit at a time. The Bar Hierarchy seems a pretty good way to do that, it’s informative and responsive. However it’d be nice to integrate with other controls/visual cues that would tell one what level of depth they’re currently looking at, which siblings are available etc.. etc..
  • BarHiearchy

  • Partition tables also looks pretty good in providing a visual summary of the categories available; however they tend to fail quickly when there are too many nodes, and the text is often not readable at all.. in the example below I had to include only the first 3 levels of the taxonomy for it to be loaded properly!
  • TreeMapD3

    TreeMap

  • Rotating tree. Essentially a Tree plotted on a circle, very useful to provide a graphical overview of the data but it tends to become non responsive quickly.
  • RotatingTree

  • Hierarchical pie chart. A pie chart that allows zooming in so to reveal hierarchical relationships (often also called Zoomable Sunburst). Quite nice and responsive, also with a large amount of data. The absence of labels could be a limiting feature though; you get a nice overview of the datascape but can’t really understand the meaning of each element unless you mouse over it.
  • PieTree

     

    Other stuff out there that could do a better job?

     

    ]]>
    http://www.michelepasin.org/blog/2013/06/21/messing-around-wih-d3-js-and-hierarchical-data/feed/ 7 2379
    Using Django-MPTT: lessons learned… http://www.michelepasin.org/blog/2009/09/15/using-django-mptt-lessons-learned/ http://www.michelepasin.org/blog/2009/09/15/using-django-mptt-lessons-learned/#comments Tue, 15 Sep 2009 16:35:59 +0000 http://magicrebirth.wordpress.com/?p=318 Here we are again with Django and MPTT 0.3 (I already have other posts about it). After working with it for a bit I realized that things were breaking mysteriously, and only recently understood why that happened, so I thought I’d share this pearl of wisdom. Essentially this has to do with the way tree-elements must be created if you want the usual tree-navigation methods (e.g. get_descendants or get_ancestors) to work as expected.

    Suppose your Django model looks like this:

    from django.db import models

    import mptt

    class PossessionNew(models.Model):
        possname = models.CharField(max_length=50, unique=True)
        parent = models.ForeignKey('self', null=True, blank=True, related_name='children')

    mptt.register(PossessionNew,)

    Suppose now that you want to start instantiating the PossessionNew model (sorry about the name, it’s just taken out the context of the application I was working on).

    In my case I was creating the tree from other data which needed some pre-processing in order to determine the hierarchical information, so I thought I’d create first all the instances, and then ‘link’ them by setting their .parent attribute as needed. This turned out to be the wrong way of doing it.
    In other words, what I did was this: first, creating the instances, and second, create the relationships. E.g.:

    bash-3.2$ ./runshell.my
    Python 2.5.1 (r251:54863, Sep 11 2008, 14:17:35)
    [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    (InteractiveConsole)
    >>> from poms.pomsapp.models import *
    >>> p1 = PossessionNew(possname="test11")
    >>> p2 = PossessionNew(possname="test22")
    >>> p3 = PossessionNew(possname="test33")
    >>> p1.save()
    >>> p2.save()
    >>> p3.save()
    >>> p3.parent = p2
    >>> p2.parent = p1
    >>> p3.save()
    >>> p2.save()

    Everything seemed to work fine and it also looked fine on the admin interface.
    However, things didn’t work when trying to use MPTT APIs. For example, after restarting the shell:

    >>> p2 = PossessionNew.objects.get(possname="test22")
    >>> p2.get_children()
    []  # weird!
    >>> p2.get_
    p2.get_ancestors               p2.get_descendants             p2.get_next_sibling            p2.get_previous_sibling
    p2.get_children                p2.get_next_by_created_at      p2.get_previous_by_created_at  p2.get_root
    p2.get_descendant_count        p2.get_next_by_updated_at      p2.get_previous_by_updated_at  p2.get_siblings
    >>> p2.parent
    <PossessionNew: test11>
    >>> p2.children.all()
    [<PossessionNew: test33>]
    >>> p2.get_ancestors()
    [<PossessionNew: test11>]
    >>> p2.get_descendants()
    [] # weird!

    As you can see, we have some erratic behavior here, in fact the get_descendants and other similar methods don’t produce the desired output..

    I soon realized that the error lies in the fact that when creating children manually – e.g. by setting the .parent attribute of an instance – the other fields needed for MPTT to manage the tree are not updated.

    So, here’s the right way of doing this.
    When operating with the tree you must always use the insert_at and move_to methods that come with the MPTT library. So, for example:

    >>> p3 = PossessionNew(possname = "test333")
    >>> p1 = PossessionNew(possname = "test111")
    >>> p2 = PossessionNew(possname = "test222")
    >>> p1.save()
    >>> p2.save()
    >>> p3.save()
    >>> p2.move_to(p1)
    >>> p3.move_to(p2)

    [Update 17/11/10:] almost by chance, I finally understood what the problem was here.. mptt has a problem with updating instances already stored in memory. What follows is still valid, but there are other ways around the problem too eg. check this link too.

    Now let’s check again whether the model inheritance works all right (I usually have to restart the shell in order to check this, otherwise the modifications are not loaded properly. I haven’t figured out yet why this happens…):

    >>> p2 = PossessionNew.objects.get(possname = "test222")
    >>> p2.get_descendants()
    [<PossessionNew: test333>]
    >>> p2.get_ancestors()
    [<PossessionNew: test111>]

    Now it all makes more sense, doesn’t it?
    Notice that in this case we used the move_to method. We could also have used insert_at (check the docs). Remember also that we’ve been using these methods with instances up to now (=instance methods). If necessary, you could also achieve the same results by means of the TreeManager custom manager (=custom manager methods).
    So, for example:

    >>> p1 = PossessionNew(possname = "test11")
    >>> PossessionNew.tree.insert_node(p1, None, commit=True)
    <PossessionNew: test11>
    >>> p2 = PossessionNew(possname = "test22")
    >>> PossessionNew.tree.insert_node(p2, p1, commit=True)
    <PossessionNew: test22>
    >>> p3 = PossessionNew(possname = "test33")
    >>> PossessionNew.tree.insert_node(p3, p2, commit=True)
    <PossessionNew: test33>

    That’s it. Follows a list of the model instance methods MPTT makes available:

    get_ancestors(ascending=False) — creates a QuerySet containing the ancestors of the model instance. These default to being in descending order (root ancestor first, immediate parent last); passing True for the ascending argument will reverse the ordering (immediate parent first, root ancestor last).

    get_children() — creates a QuerySet containing the immediate children of the model instance, in tree order. The benefit of using this method over the reverse relation provided by the ORM to the instance’s children is that a database query can be avoided in the case where the instance is a leaf node (it has no children).

    get_descendants(include_self=False) — creates a QuerySet containing descendants of the model instance, in tree order.
    If include_self is True, the QuerySet will also include the model instance itself.

    get_descendant_count() — returns the number of descendants the model instance has, based on its left and right tree node edge indicators. As such, this does not incur any database access.

    get_next_sibling() — returns the model instance’s next sibling in the tree, or None if it doesn’t have a next sibling.

    get_previous_sibling() — returns the model instance’s previous sibling in the tree, or None if it doesn’t have a previous sibling.

    get_root() — returns the root node of the model instance’s tree.

    get_siblings(include_self=False) — creates a QuerySet containing siblings of the model instance. Root nodes are considered to be siblings of other root nodes. If include_self is True, the QuerySet will also include the model instance itself.

    insert_at(target, position=’first-child’, commit=False) — positions the model instance (which must not yet have been inserted into the database) in the tree based on target and position (when appropriate). If commit is True, the model instance’s save() method will be called before the instance is returned.

    is_child_node() — returns True if the model instance is a child node, False otherwise.

    is_leaf_node() — returns True if the model instance is a leaf node (it has no children), False otherwise.

    is_root_node() — returns True if the model instance is a root node, False otherwise.

    move_to(target, position=’first-child’) — moves the model instance elsewhere in the tree based on target and position (when appropriate).

     

    ]]>
    http://www.michelepasin.org/blog/2009/09/15/using-django-mptt-lessons-learned/feed/ 10 318
    Django admin and MPTT #2 http://www.michelepasin.org/blog/2009/08/18/django-admin-and-mptt-2/ http://www.michelepasin.org/blog/2009/08/18/django-admin-and-mptt-2/#comments Tue, 18 Aug 2009 10:46:32 +0000 http://magicrebirth.wordpress.com/?p=275 This is a follow up to the previous post on managing and visualizing trees using django. I’ve been using MPTT quite a bit now and it’s great – also, I looked deeper into the admin integration (basically, the issue of being able of manage trees from within the admin).

    The major issue I had with the patch discussed in my previous post was the fact that it is mainly a javascript hack – everything is done in the browser – i.e. it looked wonderful but it didn’t really scale up. So if you had a lot of items in your tree (say thousands) the js-driven pagination (basically, hiding and showing things on demand) would crash – at least, it did that to me all the time!

    The solution has been to reuse the admin management section from the FeinCMS project – this is a great CMS created by a bunch of django-ers in Switzerland – it relies heavvily on django’s admin so they had the same problem when having to provide a way for users to add structure to the pages in a website. With some help from google and Matthias (tx! he’s one of the guys behind FeinCMS) I got it all working.. here’s the main steps:

    1. upgrade to django 1.1
    This might not be necessary, cause everything is supposed to work also with the previous release – I tried it with django 1.0 too but I had to fix a couple of urls to make it work (Basically media files which were not loaded properly). So, if you want a hassle-free installation just upgrade to django 1.1 ! (which is great btw)

    2. download and add feincms, mptt to you installed apps in settings.py
    As simple as that.

    INSTALLED_APPS = (
        ....
        'mptt',
        'feincms',
    )
    

    3. specify a url-path for feincms media in settings.py, and also a location (this may vary in your live server, if you want apache to serve these files directly).

    FEINCMS_ADMIN_MEDIA = '/feincms_media/'
    FEINCMS_ADMIN_MEDIA_LOCATION = '/My/local/path/to/feincms/media/'
    

    4. add a url handler for feincms media files in urls.py. Again, these settings are ok on a development server, on production phase you might wanna do things differently :-)

    from django.conf import settings
        urlpatterns = patterns('',
        .....
        (r'^feincms_media/(?P<path>.*)$', 'django.views.static.serve',
             {'document_root': settings.FEINCMS_ADMIN_MEDIA_LOCATION, 'show_indexes': True}),
    )
    

    5. register your hierarchical model with mptt, then remember to set the Meta option correctly.. this is needed for a correct display of the tree in the admin (obviously you need to run syncdb to create the table in the db!):

    from django.db import models
    import mptt
    
    class TreeNode(models.Model):
       name = models.CharField(max_length=50, unique=True)
       parent = models.ForeignKey('self', null=True, blank=True, related_name='children')
    
       def __unicode__(self):
                    return self.name
    
       class Meta:
              ordering = ['tree_id', 'lft']
    
    
    mptt.register(TreeNode,)
    

    6. create a model admin that inherits from feincms TreeEditor class:

    from django.contrib import admin
    from django.utils.translation import ugettext_lazy as _
    from django.conf import settings as django_settings
    from feincms.admin import editor
    from myproject.myapp.models import *
    
    class TreeNodeAdmin(editor.TreeEditor):
        pass
    
    admin.site.register(TreeNode, TreeNodeAdmin)
    

    End! That should be it! Let me know if I forgot something..
    Here’s a screenshot of the new tree-management admin page we created:

    Picture 1

    UPDATE 09/2009: how to add more actions to the tree bar.
    Just override the _actions_column method on the TreeNodeAdmin class, as follows::

    class TreeNodeAdmin(editor.TreeEditor):
            def _actions_column(self, page):
                    actions = super(TreeNodeAdmin, self)._actions_column(page)
                    actions.insert(0, u'<a href="add/?parent=%s" title="%s"><img
                           src="%simg/admin/icon_addlink.gif" alt="%s"></a>' %
                           (page.pk, _('Add child page'),
                           settings.ADMIN_MEDIA_PREFIX , _('Add child page')))
                   actions.insert(0, u'<a href="%s" title="%s"><img
                           src="%simg/admin/selector-search.gif" alt="%s" /></a>' %
                           (page.get_absolute_url(), _('View on site'),
                           django_settings.ADMIN_MEDIA_PREFIX, _('View on site')))
                    return actions
    
    admin.site.register(TreeNode, TreeNodeAdmin
    

     

    ]]>
    http://www.michelepasin.org/blog/2009/08/18/django-admin-and-mptt-2/feed/ 31 275
    Representing hierarchical data with Django and MPTT http://www.michelepasin.org/blog/2009/08/06/representing-hierarchical-data-with-django-and-mptt/ http://www.michelepasin.org/blog/2009/08/06/representing-hierarchical-data-with-django-and-mptt/#comments Thu, 06 Aug 2009 19:04:22 +0000 http://magicrebirth.wordpress.com/?p=255 Picture 3

    Apparently, you’ve got two options for managing hierarchical data in djangodjango-mptt and django-treebeard. I didn’t have any time to test both of them carefully, so I just played a bit with the first one (and with great results!). [p.s. the comparison table above is not mine, but I found it quite useful. Click on the image to find out how it was created… ]

    I guess that the key feature I was looking for is the admin-integration. Trees must be displayed and edited properly in the admin… unfortunately both projects don’t provide that feature by default, but luckily for there are attempts (#1 and #2) to fix this issue.

    In order to use MPTT with your models you just have to download it, add it to your ‘installed application’ settings and register the models you intend to use:

    # A mimimal example usage of ``mptt.register`` is given below, where the
    #  model being set up for MPTT is suitable for use with the default
    # arguments which specify fields and the tree manager attribute::
    
       from django.db import models
    
       import mptt
    
       class Genre(models.Model):
           name = models.CharField(max_length=50, unique=True)
           parent = models.ForeignKey('self', null=True, blank=True, related_name='children')
    
       mptt.register(Genre, order_insertion_by=['name'])
    

    Then, after installing the patches, create a tree-friendly admin by subclassing MpttModelAdmin (check out the docs for more info).

    Picture 4

    Here’s the result – not bad at all! I just had to install django-mptt and the patches needed for using the jquery nested-sortable library with the admin. I’ll be working more on this during the next days so probably I’ll be posting more stuff….

     

    ]]>
    http://www.michelepasin.org/blog/2009/08/06/representing-hierarchical-data-with-django-and-mptt/feed/ 3 255