ClioPatria is a "SWI-Prolog application that integrates SWI-Prolog's libraries for RDF and HTTP services into a ready-to-use (semantic) web server". It is actively developed by the folks at the VU University of Amsterdam and is freely available online.
While at a conference last week, I saw a pretty cool demo (DIVE) which, I later learned, is powered by the ClioPatria triplestore. So I thought I'd give it a try and, by doing so, write a follow-up to my recent post on installing OWLIM on Mac OS.
OSX: Mavericks 10.9.5 XCode: latest version available from Apple HOMEBREW: ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)" Prolog: build it from source using brew: brew install swi-prolog ClioPatria: git clone https://github.com/ClioPatria/ClioPatria.git
After you have downloaded and unpacked the archive, all you need to do is start a new project using the ClioPatria script. In short, this is done by creating a new directory and telling ClioPatria to configure it as a project:
[michele.pasin]:~/Documents/ClioPatriaProjects/firstproject> ../path/to/ClioPatria/configure
A bunch of files are created, including a script run.pl which you can use later to run the server.
I tried running the run.pl as per documentation but that didn't work:
[michele.pasin]@Tartaruga:~/Documents/ClioPatriaProjects/firstproject>./run.pl ./run.pl: line 3: :-: command not found ./run.pl: line 5: /Applications: is a directory ./run.pl: line 6: This: command not found ./run.pl: line 8: syntax error near unexpected token `(' ./run.pl: line 8: ` % ./configure (Unix)'
According to a thread on Stack Overflow, the Prolog shebang line isn't interpreted correctly by OS X, meaning that Mac OS doesn't recognize that script as a Prolog program.
This can be easily solved by calling the Prolog interpreter (swipl) explicitly:
[michele.pasin]@Tartaruga:~/Documents/ClioPatriaProjects/firstproject>swipl run.pl ERROR: /Applications/-Other-Apps/8-Languages-IDEs/ClioPatria/rdfql/sparql_runtime.pl:1246:14: Syntax error: Operator expected % run.pl compiled 1.64 sec, 25,789 clauses % Started ClioPatria server at port 3020 % You may access the server at http://tartaruga.local:3020/ % Loaded 0 graphs (0 triples) in 0.00 sec. (0% CPU = 0.00 sec.) Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 6.6.6) Copyright (c) 1990-2013 University of Amsterdam, VU Amsterdam SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. Please visit http://www.swi-prolog.org for details.
You should be able to access the server with your browser on port 3020 (note: the previous command caused a syntax error too, but luckily that isn't a showstopper).
Super-easy to install, with a clean and intuitive user interface. I subsequently added a couple of RDF datasets and it all went very smoothly.
One cool feature is the fact that ClioPatria has a built-in package management system, which allows you to easily install extensions to the application. For example what follows allows one to quickly extend the UI with a couple of 'intelligent' SPARQL query interfaces (Yasque and Flint):
[michele.pasin]@Tartaruga:/Applications/ClioPatria>sudo git submodule update --init web/yasqe web/yasr Password:
[michele.pasin]@Tartaruga:/Applications/ClioPatria>sudo git submodule update --init web/FlintSparqlEditor
As in my previous post, I tried loading the NPG Articles dataset available at nature.com's legacy linked data site data.nature.com. The dataset contains around 40M triples describing (at the metadata level) all that's been published by NPG and Scientific American from 1845 until now. The file size is ~6 GB, so it's not a huge dataset. Still, it's something big enough to pose a challenge to my MacBook Pro (8GB RAM).
I used the web UI ('load local file') to load the dataset, but I quickly ran into a 'not enough memory' error. I tried fiddling with the settings accessible via the web interface (Stack limit, Time limit), but that didn't seem to help much. So I increased the memory allocated to the Prolog process (more info here); however, this wasn't enough since after around 20 minutes the whole thing crashed again due to an out of memory error.
[michele.pasin]@Tartaruga:~/Documents/ClioPatriaProjects/firstproject>swipl -G6g run.pl
In the end I got in touch with the ClioPatria creators via the mailing list: in their (incredibly fast) reply they suggested to load the dataset manually using the server Prolog console. You'd do that simply by using the rdf_load command after starting the ClioPatria server (as shown above):
?- rdf_load('/Users/michele.pasin/Downloads/NPGcitationsGraph/articles.2012-07-16/articles.nq') | . % Parsed "articles.nq" in 1149.71 sec; 0 triples
That worked: the dataset was loaded in around 20 minutes. Job done!
However, when I tried to run some queries, the application became very slow and ultimately unresponsive (especially with queries like trying to retrieve all named classes from the graph). I tried restarting the triplestore and realized that once you do that, ClioPatria begins by reloading all repositories previously created - which, in the case of my 40M triples repo, would take around 10-15 minutes.
After restarting the server, queries were a bit faster but in many cases still pretty slow on my 8GB RAM laptop.
I am sure there are many more things that could be optimized; however, I'm no Prolog expert, nor could I figure out where to start just based on the online documentation. So I kind of gave up on using it to work on large datasets on my MacBook for now.
On the other hand, I really liked ClioPatria's intuitive and simple UI, its ease of installation, and the fact that you can perform operations transparently and interactively via a Prolog console (assuming you know how to do that).
All in all, ClioPatria seems to me a really good option if you want to get up and running quickly, for example in order to prototype linked data applications or explore small to medium-sized RDF datasets (10M triples or so, I guess). For bigger datasets, you'd better equip your Mac with a few gigabytes of extra RAM!
Whitepaper with technical analysis - http://cliopatria.swi-prolog.org/help/whitepaper.html
Mailing list - http://mailman.few.vu.nl/mailman/listinfo/cliopatria-list
Cite this blog post:
Comments via Github:
2014