sparql – Parerga und Paralipomena

Installing GraphDB (aka OWLIM) triplestore on mac os

mikele — Thu, 16 Oct 2014 19:05:38 +0000

GraphDB (formerly called OWLIM) is an RDF triplestore which is used – among others – by large organisations like the BBC or the British Museum. I’ve recently installed the LITE release of this graph database on my mac, so what follows is a simple write up of the steps that worked for me.

Haven’t played much with the database yet, but all in all, the installation was much simpler than expected (ps: this old recipe on google code was very helpful in steering me in the right direction with the whole Tomcat/Java setup).

1. Requirements

OSX: Mavericks 10.9.5
XCode: latest version available from Apple
HOMEBREW: ruby -e “$(curl -fsSkL raw.github.com/mxcl/homebrew/go)”
Tomcat7: brew install tomcat
JAVA: available from Apple

Finally – we obviously want to get a copy of OWLIM-Lite too: http://www.ontotext.com/owlim/downloads

2. Setting up

After you have downloaded and unpacked the archive, you must simply copy these two files:

owlim-lite/sesame_owlim/openrdf-sesame.war
owlim-lite/sesame_owlim/openrdf-workbench.war

..to the Tomcat webapps folder:

/usr/local/Cellar/tomcat/7.0.29/libexec/webapps/

Essentially that’s because OWLIM-Lite is packaged as a storage and inference layer for the Sesame RDF framework, which runs here as a component within the Tomcat server (note: there are other ways to run OWLIM, but this one seemed the quickest).

3. Starting Tomcat

First I created a symbolic link in my ~/Library folder, so to better manage new versions (as suggested here).

sudo ln -s /usr/local/Cellar/tomcat/7.0.39 ~/Library/Tomcat

Then in order to start/stop Tomcat it’s enough to use the catalina command:

[michele.pasin]@here:~/Library/Tomcat/bin>./catalina start
Using CATALINA_BASE:   /usr/local/Cellar/tomcat/7.0.39/libexec
Using CATALINA_HOME:   /usr/local/Cellar/tomcat/7.0.39/libexec
Using CATALINA_TMPDIR: /usr/local/Cellar/tomcat/7.0.39/libexec/temp
Using JRE_HOME:        /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Using CLASSPATH:       /usr/local/Cellar/tomcat/7.0.39/libexec/bin/bootstrap.jar:/usr/local/Cellar/tomcat/7.0.39/libexec/bin/tomcat-juli.jar

[michele.pasin]@here:~/Library/Tomcat/bin>./catalina stop
Using CATALINA_BASE:   /usr/local/Cellar/tomcat/7.0.39/libexec
Using CATALINA_HOME:   /usr/local/Cellar/tomcat/7.0.39/libexec
Using CATALINA_TMPDIR: /usr/local/Cellar/tomcat/7.0.39/libexec/temp
Using JRE_HOME:        /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Using CLASSPATH:       /usr/local/Cellar/tomcat/7.0.39/libexec/bin/bootstrap.jar:/usr/local/Cellar/tomcat/7.0.39/libexec/bin/tomcat-juli.jar

Tip: Tomcat runs by default on port 8080. That can be changed pretty easily by modifying a parameter in server.xml in {Tomcat installation folder}/libexec/conf/ more details here.

4. Testing the Graph database

Start a browser and go to the Workbench Web application using a URL of this form: http://localhost:8080/openrdf-workbench/ (substituting localhost and the 8080 port number as appropriate). You should see something like this:

After selecting a server, click ‘New repository’.

Select ‘OWLIM-Lite’ from the drop-down and enter the repository ID and description. Then click ‘next’.

Fill in the fields as required and click ‘create’.

That’s it! A message should be displayed that gives details of the new repository and this should also appear in the repository list (click ‘repositories’ to see this).

5. Loading a big dataset

I’ve set out to load the NPG Articles dataset available at nature.com’s legacy linked data site data.nature.com.

The dataset contains around 40M triples describing (at the metadata level) all that’s been published by NPG and Scientific American from 1845 till nowadays. The file size is ~6 gigs so it’s not a huge dataset. Still, something big enough to pose a challenge to my macbook pro (8gigs RAM).

First, I increased the memory allocated to the Tomcat application to 5G. It was enough to create a setenv.sh file in the ${tomcat-folder}\bin\ folder. The file contains this line:

CATALINA_OPTS=”$CATALINA_OPTS -server -Xms5g -Xmx5g”

More details on Tomcat’s and Java memory issues are available here.

Then I used OWLIM’s web interface to create a new graph repository and upload the dataset file into it (I previously downloaded a copy of the dataset to my computer so to work with local files only).

It took around 10 minutes for the application to upload the file into the triplestore, and 2-3 minutes for OWLIM to process it. Much much faster than what I expected. Only minor issue, the lack of notifications (in the UI) of what was going on. Not a big deal in my case, but with larger dataset uploads it might be a potential downer.

Note: I used the web form to upload the dataset, but there are also ways to do that from the command line (which will probably result in even faster uploads).

6. Useful information

> Sparql endpoints

All of your repositories come also with a handy SPARQL endpoint, which is available at this url: http://localhost:8080/openrdf-sesame/repositories/test1 (just change the last bit so that it matches your repository name).

> Official documentation

https://confluence.ontotext.com/display/GraphDB6

> Ontotext’s Q&A forum

http://answers.ontotext.com

Textmate bundle for Turtle and Sparql

mikele — Tue, 13 Aug 2013 17:07:34 +0000

I recently ran into the Textmate bundle for Turtle, an extension for the Textmate osx editor aimed at facilitating working with RDF and SPARQL. If you happen to be using these technologies, well I’d suggest you take a look at the following post.

The Resource Description Framework is a general-purpose language for representing information which is widely used on the web in order to encode metadata in a machine-interoperable format.

Turtle, the terse RDF Triple Language, is a textual syntax for RDF which aims at human readability and compactness (among other things).
This is what it looks like:


@prefix rdf: 
@prefix rdfs: 
@prefix xsd: .
@base 

:MotorVehicle a rdfs:Class.

:PassengerVehicle a rdfs:Class;
   rdfs:subClassOf :MotorVehicle.

:Person a rdfs:Class.

xsd:integer a rdfs:Datatype.

:registeredTo a rdf:Property;
   rdfs:domain :MotorVehicle;
   rdfs:range  :Person.

:myLittleCar a PassengerVehicle

The termite library in question, in a nutshell, provides a bunch of snippets and query mechanisms that make it easier to work with Turtle RDF and related technologies.
More precisely, here’s the official features breakdown:

Language grammar for Turtle and SPARQL 1.1

Powerful (!) auto-completion (live-aggregated)

Documentation for classes and roles/properties at your fingertips (live-aggregated)

Interactive SPARQL query scratchpad

Some snippets (prefixes and document skeleton)

Solid syntax validation

Commands for instant graph visualization of a knowledge base (requires Graphviz and Raptor)

Conversion between all common RDF formats

Example

In order to query a SPARQL endpoint (eg DBPedia) just type this in and run it (apple-R):


#QUERY                     
SELECT DISTINCT ?s ?label                             
WHERE {                                               
    ?s  ?o .      
}

Obviously you can query any endpoint, e.g. data.nature.com:



#QUERY 

PREFIX bibo:
PREFIX dc:
PREFIX dcterms:
PREFIX foaf:
PREFIX npg:
PREFIX npgg:
PREFIX npgx:
PREFIX owl:
PREFIX prism:
PREFIX rdf:
PREFIX rdfs:
PREFIX sc:
PREFIX skos:
PREFIX void:
PREFIX xsd:


SELECT *                            
WHERE {                                                
    ?doi a npg:Article . 
    ?doi dc:title ?title .
    ?doi prism:publicationDate ?date
} 
limit 100

And this is just the tip of the iceberg. Autocompletion, visualisations etc… it may be the Textmate-Semantic Web swiss army knife you’ve been looking for.

A few useful Linked Data resources

mikele — Thu, 17 Mar 2011 11:32:00 +0000

Done a bit of semantic web work in the last couple of weeks, which gave me a chance to explore better the current web-scenario around this topic. I’m working on some example applications myself, but in the meanwhile I thought I’d share here a couple of quite useful links I ran into.

Development Tools:

Quick and Dirty RDF browser. It does just what is says: you pass it an rdf file and it helps you making sense of it. For example, check out the rdf graph describing the city of Southampton on DbPedia: http://dbpedia.org/resource/Southampton. Minimal, fast and useful!

Namespace lookup service for RDF developers. The intention of this service is to simplify a common task in the work of RDF developers: remembering and looking up URI prefixes.You can look up prefixes from the search box on the homepage, or directly by typing URLs into your browser bar, such as http://prefix.cc/foaf or http://prefix.cc/foaf,dc,owl.ttl.

Knoodl Knoodl is an online tool for creating, managing, and analyzing RDF/OWL descriptions. It has several features that support collaboration in all stages of these activities (eg it lets you create quite easily discussion forums around ontological modeling decisions). It’s hosted in the Amazon EC2 cloud and can be used for free.

Rdf Goole chrome extensions. Just a list of extensions for Google Chrome that make working with rdf much simpler, for example by detecting rdf annotations embedded in HTML.

Get the data. Ask and answer questions about getting, using and sharing data! A StackOverflow clone that crowd-sources the task of finding out whether the data you need are available, and where.

Articles / Tutorials

Linked Data Guide for Newbies. It’s primarily aimed at “people who’re tasked with creating RDF and don’t have time to faff around.” It’s a brief and practical introduction to some of the concepts and technical issues behind Linked Data – simple and effective, although it obviously hides all the most difficult aspects.

What you need to know about RDF+XML. Again, another gentle and practical intro.

Linked Data: design issues. One of the original articles by Berners Lee. It goes a little deeper into the theoretical issues involved with the Linked Data approach.

Linked Data: Evolving the Web into a Global Data Space. Large and thorough resource: this book is freely available online and contains all that you need to become a Linked Data expert – whatever that means!

Linked Data/RDF/SPARQL Documentation Challenge. A recent initiative aimed at pushing people to document the ‘path to rdf’ with as many languages and environments as possible. The idea is to move away from some kind of academic-circles-only culture and create something “closer to the Django introduction tutorial or the MongoDB quick start guide than an academic white paper“. This blog post is definitely worth checking out imho, especially because of the wealth of responses it has elicited!

Introducing SPARQL: Querying the Semantic Web. An in-depth article at XML.com that introduces SPARQL – the query language and data access protocol for the Semantic Web.

A beginner’s guide to SPARQLing linked data. A more hands-on description of what SPARQL can do for you.

Linked Data: how to get your dataset in the diagram. So you’ve noticed the Linked Data bubbles growing bigger and bigger. Next step is – how to contribute and get in there? This article gives you all the info you need to know.

~~Semantic Overflow~~ Answers.semanticweb.com. If you run out of ideas, this is the place where to ask for help!