semanticweb – Parerga und Paralipomena

An interactive Turtle shell

mikele — Sun, 12 Oct 2014 09:21:41 +0000

Wouldn’t it be nice to have an interactive environment where you quickly hack together an RDF model and then show it to your clients or colleagues in a more accessible format – i.e. a diagram?

Don’t know if there’s anything like that already, but the other day while polishing up the OntosPy library I’ve taken a couple of hours of fun-coding and put together a module that lets you do that.

The idea is simple: load an interactive environment where you can quickly sketch out a few ideas using the (very readable) Turtle rdf format.

Then export it onto a different representation e.g. a graphical one, so that it can be shown to people. Or just to keep working on it via a medium that offers different affordances.

So here it is:

[michele.pasin]@here:~/code/python>sketch.py 
Good morning. Ready to Turtle away. Type docs() for help.
In [1]: docs()

====Sketch v 0.2====

add()  ==> add statements to the graph
...........SHORTCUTS:
...........'class' = owl:Class
...........'sub' = rdfs:subClassOf
...........TURTLE SYNTAX:  http://www.w3.org/TR/turtle/

show() ==> shows the graph. Can take an OPTIONAL argument for the format.
...........eg one of['xml', 'n3', 'turtle', 'nt', 'pretty-xml', dot']

clear()	 ==> clears the graph
...........all triples are removed

omnigraffle() ==> creates a dot file and opens it with omnigraffle
...........First you must set Omingraffle as your system default app for dot files!

quit() ==> exit

====Have fun!====


In [2]: add()
Multi-line input. Enter ### when finished.
:person a class
:mike a :person
:person sub :agent
:organization sub :agent
:worksIn rdfs:domain :person
:worksIn rdfs:range :organization
:mike :worksIn :DamageInc
:DamageInc a :organization

In [3]: show()
@prefix : <http://this.sketch#> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix npg: <http://ns.nature.com/terms/> .
@prefix npgg: <http://ns.nature.com/graphs/> .
@prefix npgx: <http://ns.nature.com/extensions/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:mike a :person ;
    :worksIn :DamageInc .

:worksIn rdfs:domain :person ;
    rdfs:range :organization .

:DamageInc a :organization .

:organization rdfs:subClassOf :agent .

:person a owl:Class ;
    rdfs:subClassOf :agent .



In [4]: show("xml")
1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns="http://this.sketch#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
  <rdf:Description rdf:about="http://this.sketch#mike">
    <rdf:type rdf:resource="http://this.sketch#person"/>
    <worksIn rdf:resource="http://this.sketch#DamageInc"/>
  rdf:Description>
  <rdf:Description rdf:about="http://this.sketch#organization">
    <rdfs:subClassOf rdf:resource="http://this.sketch#agent"/>
  rdf:Description>
  <rdf:Description rdf:about="http://this.sketch#DamageInc">
    <rdf:type rdf:resource="http://this.sketch#organization"/>
  rdf:Description>
  <rdf:Description rdf:about="http://this.sketch#person">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
    <rdfs:subClassOf rdf:resource="http://this.sketch#agent"/>
  rdf:Description>
  <rdf:Description rdf:about="http://this.sketch#worksIn">
    <rdfs:domain rdf:resource="http://this.sketch#person"/>
    <rdfs:range rdf:resource="http://this.sketch#organization"/>
  rdf:Description>
rdf:RDF>

In [5]: omnigraffle()
### saves a dot file and tries to open it with your default editor
### if you're on a mac and have omnigraffle - that could be the one!

In [6]: quit()

If you are mac based and you have associated .dot files to the excellent Omnigraffle app, you’d see something like this:

That speeded up my work quite a bit – especially in situations where you don’t mind about precision but are more interested in quickly showing the merits of a modelling approach.

Any comments or ideas on how to develop this further?

Textmate bundle for Turtle and Sparql

mikele — Tue, 13 Aug 2013 17:07:34 +0000

I recently ran into the Textmate bundle for Turtle, an extension for the Textmate osx editor aimed at facilitating working with RDF and SPARQL. If you happen to be using these technologies, well I’d suggest you take a look at the following post.

The Resource Description Framework is a general-purpose language for representing information which is widely used on the web in order to encode metadata in a machine-interoperable format.

Turtle, the terse RDF Triple Language, is a textual syntax for RDF which aims at human readability and compactness (among other things).
This is what it looks like:


@prefix rdf: 
@prefix rdfs: 
@prefix xsd: .
@base 

:MotorVehicle a rdfs:Class.

:PassengerVehicle a rdfs:Class;
   rdfs:subClassOf :MotorVehicle.

:Person a rdfs:Class.

xsd:integer a rdfs:Datatype.

:registeredTo a rdf:Property;
   rdfs:domain :MotorVehicle;
   rdfs:range  :Person.

:myLittleCar a PassengerVehicle

The termite library in question, in a nutshell, provides a bunch of snippets and query mechanisms that make it easier to work with Turtle RDF and related technologies.
More precisely, here’s the official features breakdown:

Language grammar for Turtle and SPARQL 1.1

Powerful (!) auto-completion (live-aggregated)

Documentation for classes and roles/properties at your fingertips (live-aggregated)

Interactive SPARQL query scratchpad

Some snippets (prefixes and document skeleton)

Solid syntax validation

Commands for instant graph visualization of a knowledge base (requires Graphviz and Raptor)

Conversion between all common RDF formats

Example

In order to query a SPARQL endpoint (eg DBPedia) just type this in and run it (apple-R):


#QUERY                     
SELECT DISTINCT ?s ?label                             
WHERE {                                               
    ?s  ?o .      
}

Obviously you can query any endpoint, e.g. data.nature.com:



#QUERY 

PREFIX bibo:
PREFIX dc:
PREFIX dcterms:
PREFIX foaf:
PREFIX npg:
PREFIX npgg:
PREFIX npgx:
PREFIX owl:
PREFIX prism:
PREFIX rdf:
PREFIX rdfs:
PREFIX sc:
PREFIX skos:
PREFIX void:
PREFIX xsd:


SELECT *                            
WHERE {                                                
    ?doi a npg:Article . 
    ?doi dc:title ?title .
    ?doi prism:publicationDate ?date
} 
limit 100

And this is just the tip of the iceberg. Autocompletion, visualisations etc… it may be the Textmate-Semantic Web swiss army knife you’ve been looking for.

ESWC 2013 – report from the conference

mikele — Wed, 05 Jun 2013 17:06:43 +0000

Last week I attended the European Semantic Web Conference (ESWC’13) in Montpellier and had a really good time meeting old friends and catching up with the latest research in this area. In this post I’ll collect a few pointers to papers and ideas that caught my attention.

For a high level summary of the talks, you can check out the pdf program, the workshops page or the tutorials page.

In particular the semantic publishing workshop SEPublica13 was very relevant for my current work, as its stated purpose is to discuss and review “accessing and reusing the computable data that the literature represents and describes” – something that all digital publishers are thinking about these days.

As for the rest of the conference, here’s a more lengthy summary of (some of) the presentations I managed to attend, organised by topic.

Keynote: less semantics and more web

The keynote from MIT’s David Karger was quite remarkable. In a talk titled “The Semantic Web for End Users” he challenged several widespread assumptions about the SW (maybe most intriguingly the ‘if it’s using RDF/OWL then it’s SW‘ principle). Karger argued for a a less AI-oriented, more user-centric and web-centric view of semantic web research, according to which one of the key opportunities for SW practitioners is to “make it easier for end users to produce, share, and consume structured data“, irrespectively of whether these are encoded in any of the RDF family of languages. Rather, SW tools should be measured in terms of how much they allow people to deal effectively with ‘applications whose schema is expected to change‘.
In general, the semantic web (like the web) should not be making ‘new things possible’ but rather ‘old things simpler’.

Semantic Science

Gon, B., Porto, F., & Moura, A. M. C.. On the semantic engineering of scientific hypotheses as linked data.

The paper addresses the engineering of hypotheses as linked data, and builds upon the Linked Science Core vocabulary by extending it in order allow the definition of scientific hypotheses as assumptions that constrain the interpretation of observed phenomena for computer simulation. A prototype application built by eliciting and linking hypotheses in a published research in Computational Hemodynamics (the study of the human cardiovascular system) is discussed to illustrate the notion of ‘conceptual traceability’ of research statements.

Gil, Y., Ratnakar, V., & Hanson, P. C. Organic Data Publishing : A Novel Approach to Scientific Data Sharing.

The paper introduces an approach called ‘organic data sharing‘ that 1) links dataset contributions directly to science questions, 2) reduces the burden of data sharing by enabling any scientist to contribute metadata, and 3) tracks and exposes credit for all contributors. An initial prototype that is built as an extension of a semantic wiki, can import Linked Data, and can publish as Linked Data any new content created by users.

Zhao, J., & Klyne, G. (2013). How Reliable is Your workflow : Monitoring Decay in Scholarly Publications.

The paper addresses the notion of workflow ‘decay’. Increasingly, scientific workflows are being treated as first-class artifacts, for exchanging and transferring actual scholarly findings, either as part of scholarly articles or as stand-alone objects. However scientific workflows are commonly subject to a decayed or reduced ability to be executed or repeated, largely due to the volatility of the external resources that are required for their executions. Based on our this hypothesis, the authors present a minimal set of information to be associated in a workflow to reduce its decay and be effectively exchanged as a reproducible research object.

Callahan, A., Cruz-toledo, J., Ansell, P., & Dumontier, M. (2013). Bio2RDF Release 2 : Improved coverage , interoperability and provenance of Life Science Linked Data.

Bio2RDF is an open-source project that provides linked data for the life sciences using Semantic Web technologies. Bio2RDF scripts (available on github) convert heterogeneously formatted data (e.g. flat-files, tab-delimited files, dataset specific formats, SQL, XML etc.) into a common format, RDF. The paper describes the new features of the latest Bio2RDF release, which provides a federated network of SPARQL endpoints over 19 datasets. Other new features include provenance information via PROV, mapping of dataset-specific vocabulary to the Semanticscience Integrated Ontology (SIO), context-sensitive SPARQL query formulation using SparQLed and a central registry of datasets in order to normalize generated IRIs.

Semantic Publishing

T. Kuhn, P. E. Barbano, M. L. Nagy, and M. Krauthammer, Broadening the Scope of Nanopublications.

Traditionally, nanopublications are described as an approach to (1) subdivide scientific results into minimal pieces, (2) to represent these results — called assertions — in an RDF-based formal notation, (3) to attach RDF-based provenance information on this “atomic” level, and (4) to treat each of these tiny entities as a separate publication. The authors of this paper challenge assumption (2) as unrealistic, essentially due to the proven difficulties in acquiring structured, logic-based assertions from people, and propose a new system (nanobrowser) that allows authors and curators to attach English sentences to nanopublications, thus allowing for informal representations of scientific claims.

Lord, P., & Marshall, L. (2013). Twenty-Five Shades of Greycite : Semantics for referencing and preservation.

The paper describes two new systems: greycite and kblog-metadata. The former, addresses the problem of bibliographic metadata, without resorting to a single central authority, extracting this metadata directly from URI end-points. The latter provides more specialised support for generating appropriate metadata within the popular wordpress blogging platform. The underlying rationale for both systems, claims the author, is that semantic metadata must be of value to all participants in the publishing process, most importantly the authors.

Mavergames, C., Oliver, S., & Becker, L. (2013). Systematic Reviews as an interface to the web of ( trial ) data : Using PICO as an ontology for knowledge synthesis in evidence-based healthcare research The Cochrane Collaboration

The paper describes a prototype application that makes use of linked data technologies to improve discovery of information stored in the Cochrane Database of Systematic Reviews, a resource in the domain of healthcare research (in particular the area of evidence-based medicine). The approach described relies on the PICO framework (Population, Intervention, Comparison, Outcome) as an ontology to aid in better discoverability, presentation, and synthesis of the knowledge available in the documents offered by the database. A prototype web application based on Drupal’s SW module is presented.

Wiljes, C., Jahn, N., Lier, F., Paul-stueve, T., Vompras, J., Pietsch, C., & Cimiano, P. (2013). Towards Linked Research Data : An Institutional Approach.

The paper describes an infrastructure used that enables researchers to manage their publications and the underlying research data in an easy and efficient way within a academic institution, Bielefeld University and the associated Center of Excellence Cognitive Interaction Technology. The platform created follows a Linked Data approach and uses Virtuoso to store data sources from inside the university and outside sources like DBpedia.

NLP, knowledge extraction

Iorio, A. Di, Nuzzolese, A. G., & Peroni, S. (2013). Towards the automatic identification of the nature of citations.

The paper presents an algorithm, called CiTalO, to infer automatically the function of citations by means of Semantic Web technologies and NLP techniques. CiTalO infers the function of citations by combining techniques of ontology learning from natural language, sentiment-analysis, word-sense disambiguation, and ontology mapping. These techniques are applied in a pipeline whose input is the textual context containing the citation and the output is a one or more properties of the CiTO ontology.

Jael, L., Castro, G., Berlanga, R., Rebholz-schuhmann, D., & Garcia, A. (2013). Connections across scientific publications based on semantic annotations.

The paper presents an experiment aimed at evaluating different concept annotation solutions on full text documents to determine to which extend relatedness can be inferred from such annotations. Eleven full-text articles from the open-access subset of PubMed Central have been extracted and annotated semantically using MeSH, UMLS, and other ontologies. The authors show that connections across articles from annotations automatically identified with entity recognition tools, e.g., Whatizit, NCBO Annotator, and CMA, are similar to those connections exhibit based on the PubMed MeSH terms, thus validating their approach.

A. Gangemi, A Comparison of Knowledge Extraction Tools for the Semantic Web.

This article reviews a number of Natural Language Processing tools (for various purposes, such as name-entity recognition or word sense disambiguation) that have been configured for Semantic Web tasks including ontology learning, linked data population, entity resolution, NL querying to linked data and others. The tools have been compared using a sample taken from an online article of The New York Times and the results are available online. The tools reviewed are: AIDA, AlchemyAPI, Apache Stanbol, DBpedia Spotlight, CiceroLite, FOX, FRED, NERD, Open Calais, PoolParty Knowledge Discoverer, ReVerb, Semiosearch Wikifier, Wikimeta, Zemanta.

E. Cabrio, S. Villata, F. Gandon, and I. S. Antipolis, A Support Framework for Argumentative Discussions Management in the Web.

The paper presents an approach based on NLP for automatically extracting argumentative relationships from highly active wiki pages. The overall purpose is to support community managers in managing the discussions and have an overall view of the ongoing deabtes so to detect the winning arguments. Argumentative discussions are formalized using an extension of the SIOC Argumentation vocabulary.

O. Medelyan, S. Manion, J. Broekstra, A. Divoli, A. Huang, and I. H. Witten, Constructing a Focused Taxonomy from a Document Collection

The paper describes a new method for constructing custom taxonomies from document collections, called F-STEP. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. By using this approach the authors constructed a custom taxonomy with 10,000 concepts and 12,700 relations from 2000 news articles. An evaluation with human judges has shows high rates of precision (90%) and recall (75%).

SW tech in real world systems

P. Szekely, C. A. Knoblock, F. Yang, X. Zhu, E. E. Fink, R. Allen, and G. Goodlander, Connecting the Smithsonian American Art Museum to the Linked Data Cloud.

This paper describes the process and lessons learned in publishing the data from the Smithsonian American Art Museum. The paper contains detailed descriptions of a) how relational data have been mapped to RDF (a system called Karma was used), b) how links to other linked data URIs have been created, and c) the process of curation to ensure that both the published information and its links to other sources within the LOD are accurate. The dataset uses an extended version of the Europeana Data Model, which is the metamodel used in the Europeana project to represent data from Europe’s cultural heritage institutions, plus other standards like PROV and Schema.org.

L. M. Garshol and A. Borge, Hafslund Sesam – an archive on semantics.

The paper describes an architecture based on RDF and Virtuoso, constructed to facilitate data integration and reuse within Hafslund, a Norwegian energy company. Documents are tagged with URIs from the triple store, and these URIs connect the document metadata with enterprise data extracted from backend systems. All source systems are integrated using a custom-built client-server solution based on SDShare – a specification for synchronizing RDF data using Atom feeds.

Random notes

SparQLed is an open source app that gives you an interactive SPARQL editor with context-aware recommendations (via autocompletion and other tricks). Definitely worth taking a look at.

I missed the excellent Semantic Data Management Techniques in
Graph Databases tutorial, but luckily the slides are available online. If you’re interested in graph databases, check them out, they include a detailed analysis and comparison of various graph databases including Neo4j, Hypergraph and many others.

David Karger pointed out a web app called If THis Then That. Rule-based reasoning on the web, without any fancy AI. Pretty cool!

identifiers.org/ is yet another service that aims at providing resolvable persistent URIs used to identify data for the scientific community

Semantic Web Cheat Sheets

mikele — Tue, 27 Nov 2012 11:45:38 +0000

Here’re a couple of reference sheets that can become handy if you’re doing any semantic web related work. I’ve found both of them online, and will add more to this list as they come along.

N3 Language Cheatsheet

I found this table on this blog post: http://aabs.wordpress.com/semantic-web/the-n3-cheat-sheet/. Some of the formatting was messed up on the original post, so I cleaned it up a little

Semantic Web Cheatsheet

Another useful cheat sheet can be found here: http://ebiquity.umbc.edu/resource/html/id/94/. The card includes RDF/RDFS/OWL vocabulary, RDF/XML reserved terms, examples and SPARQL semantic web query language reference.

More? Please let us know!

Hack4Europe! – Europeana hackathon roadshow, June 2011

mikele — Mon, 23 May 2011 21:36:14 +0000

Europeana is a multilingual digital collection containing more than 15 millions resources that lets you explore Europe’s history from ancient times to the modern day. Europeana API services are web services allowing search and display of Europeana collections in your website and applications. The folks at Europeana have been actively promoting the experimentation with their APIs by organizing ‘hackathons’ – workshops for cultural informatics hackers where new ideas and discussed and implemented.

Some examples of the outputs of the previous hackathon can be found here. Hack4Europe is the most recent of these dissemination activities:

Hack4Europe! is a series of hack days organised by the Europeana Foundation and its partners Collections Trust, Museu Picasso, Poznan Supercomputing and Networking Center and Swedish National Heritage Board. The hackathon roadshow will be held simultaneously in 4 locations (London, Barcelona, Poznan and Stockholm) in the week 6 – 12 June and will provide an exciting environment to explore the potential of open cultural data for social and economic growth in Europe.

Each hackathon will bring together up to 30 developers from the hosting country and the surrounding area. They will have access to the diverse and rich Europeana collections containing over 18 million records, Europeana Search API (incl. a test key and technical documentation) and Europeana Linked Open Data Pilot datasets which currently comprise about 3 million Europeana records available under a CC0 license.

There are four hackathons coming up, so if you’re interested make sure you sign up quickly:

Hack4Europe! UK
9 June 2011, London, hosted by Collections Trust

Hack4Europe! Spain
8 – 9 June 2011, Barcelona, hosted by Museu Picasso

Hack4Europe! Poland
7 – 8 June 2011, Poznan, hosted by Poznan Supercomputing and Networking Center and Kórnik Library of the Polish Academy of Sciences

Hack4Europe! Sweden
10 – 11 June 2011, Stockholm, hosted by Swedish National Heritage Board

A few useful Linked Data resources

mikele — Thu, 17 Mar 2011 11:32:00 +0000

Done a bit of semantic web work in the last couple of weeks, which gave me a chance to explore better the current web-scenario around this topic. I’m working on some example applications myself, but in the meanwhile I thought I’d share here a couple of quite useful links I ran into.

Development Tools:

Quick and Dirty RDF browser. It does just what is says: you pass it an rdf file and it helps you making sense of it. For example, check out the rdf graph describing the city of Southampton on DbPedia: http://dbpedia.org/resource/Southampton. Minimal, fast and useful!

Namespace lookup service for RDF developers. The intention of this service is to simplify a common task in the work of RDF developers: remembering and looking up URI prefixes.You can look up prefixes from the search box on the homepage, or directly by typing URLs into your browser bar, such as http://prefix.cc/foaf or http://prefix.cc/foaf,dc,owl.ttl.

Knoodl Knoodl is an online tool for creating, managing, and analyzing RDF/OWL descriptions. It has several features that support collaboration in all stages of these activities (eg it lets you create quite easily discussion forums around ontological modeling decisions). It’s hosted in the Amazon EC2 cloud and can be used for free.

Rdf Goole chrome extensions. Just a list of extensions for Google Chrome that make working with rdf much simpler, for example by detecting rdf annotations embedded in HTML.

Get the data. Ask and answer questions about getting, using and sharing data! A StackOverflow clone that crowd-sources the task of finding out whether the data you need are available, and where.

Articles / Tutorials

Linked Data Guide for Newbies. It’s primarily aimed at “people who’re tasked with creating RDF and don’t have time to faff around.” It’s a brief and practical introduction to some of the concepts and technical issues behind Linked Data – simple and effective, although it obviously hides all the most difficult aspects.

What you need to know about RDF+XML. Again, another gentle and practical intro.

Linked Data: design issues. One of the original articles by Berners Lee. It goes a little deeper into the theoretical issues involved with the Linked Data approach.

Linked Data: Evolving the Web into a Global Data Space. Large and thorough resource: this book is freely available online and contains all that you need to become a Linked Data expert – whatever that means!

Linked Data/RDF/SPARQL Documentation Challenge. A recent initiative aimed at pushing people to document the ‘path to rdf’ with as many languages and environments as possible. The idea is to move away from some kind of academic-circles-only culture and create something “closer to the Django introduction tutorial or the MongoDB quick start guide than an academic white paper“. This blog post is definitely worth checking out imho, especially because of the wealth of responses it has elicited!

Introducing SPARQL: Querying the Semantic Web. An in-depth article at XML.com that introduces SPARQL – the query language and data access protocol for the Semantic Web.

A beginner’s guide to SPARQLing linked data. A more hands-on description of what SPARQL can do for you.

Linked Data: how to get your dataset in the diagram. So you’ve noticed the Linked Data bubbles growing bigger and bigger. Next step is – how to contribute and get in there? This article gives you all the info you need to know.

~~Semantic Overflow~~ Answers.semanticweb.com. If you run out of ideas, this is the place where to ask for help!

Survey of Pythonic tools for RDF and Linked Data programming

mikele — Thu, 24 Feb 2011 15:21:27 +0000

In this post I’m reporting on a recent survey I made in the context of a Linked Data project I’m working on, SAILS. The Resource Description Framework (RDF) is a data model and language which is quickly gaining momentum in the open-data and data-integration worlds. In SAILS we’re developing a prototype for rdf-data manipulation and querying, but since the final application (of which the rdf-components is part of) will be written in Python and Django, in what follows I tried to gather information about all the existing libraries and frameworks for doing rdf-programming using python.

1. Python libraries for working with Rdf

RdfLib http://www.rdflib.net/

RdfLib (download) is a pretty solid and extensive rdf-programming kit for python. It contains parsers and serializers for RDF/XML, N3, NTriples, Turtle, TriX and RDFa. The library presents a Graph interface which can be backed by any one of a number of store implementations, including, memory, MySQL, Redland, SQLite, Sleepycat, ZODB and SQLObject.

The latest release is RdfLib 3.0, although I have the feeling that many are still using the previous release, 2.4. One big difference between the two is that in 3.0 some libraries have been separated into another package (called rdfextras); among these libraries there’s also the one you need for processing sparql queries (the rdf query language), so it’s likely that you want to install that too.
A short overview of the difference between these two recent releases of RdfLib can be found here. The APIs documentation for RdfLib 2.4 is available here, while the one for RdfLib 3.0 can be found here. Finally, there are also some other (a bit older, but possibly useful) docs on the wiki.

Next thing, you might want to check out these tutorials:

Getting data from the Semantic Web: a nice example of how to use RdfLib and python in order to get data from DBPedia, the Semantic Web version of Wikipedia.

How can I use the Ordnance Survey Linked Data: shows how to install RdfLib and query the linked data offered by Ordnance Survey.

A quick and dirty guide to YOUR first time with RDF: another example of querying Uk government data found on data.gov.uk using RdfLib and Berkely/Sleepycat DB.

RdfAlchemy http://www.openvest.com/trac/wiki/RDFAlchemy

The goal of RDFAlchemy (install | apidocs | usergroup) is to allow anyone who uses python to have a object type API access to an RDF Triplestore. In a nutshell, the same way that SQLAlchemy is an ORM (Object Relational Mapper) for relational database users, RDFAlchemy is an ORM (Object RDF Mapper) for semantic web users.

RdfAlchemy can also work in conjunction with other datastores, including rdflib, Sesame, and Jena. Support for SPARQL is present, although it seems less stable than the rest of the library.

Fuxi http://code.google.com/p/fuxi/

FuXi is a Python-based, bi-directional logical reasoning system for the semantic web. It requires rdflib 2.4.1 or 2.4.2 and it is not compatible with rdflib 3. FuXi aims to be the ‘engine for contemporary expert systems based on the Semantic Web technologies’. The documentation can be found here; it might be useful also to look at the user-manual and the discussion group.

In general, it looks as if Fuxi can offer a complete solution for knowledge representation and reasoning over the semantic web; it is quite sophisticated and well documented (partly via several academic articles). The downside is that to the end of hacking together a linked data application.. well Fuxi is probably just too complex and difficult to learn.

About Inferencing: a very short introduction to what Fuxi inferencing capabilities can do in the context of an rdf application.

ORDF ordf.org

ORDF (download | docs) is the Open Knowledge Foundation‘s library of support infrastructure for RDF. It is based on RDFLib and contains an object-description mapper, support for multiple back-end indices, message passing, revision history and provenance, a namespace library and a variety of helper functions and modules to ease integration with the Pylons framework.

The current version of this library is 0.35. You can have a peek at some of its key functionalities by checking out the ‘Object Description Mapper‘ – an equivalent to what an Object-Relational Mapper would give you in the context of a relational database. The library seems to be pretty solid; for an example of a system built on top of ORDF you can see Bibliographica, an online open catalogue of cultural works.

Why using RDF? The Design Considerations section in the ORDF documentation discusses the reasons that led to the development of this library in a clear and practical fashion.

Django-rdf http://code.google.com/p/django-rdf/

Django-RDF (download | faq | discussiongroup) is an RDF engine implemented in a generic, reusable Django app, providing complete RDF support to Django projects without requiring any modifications to existing framework or app source code. The philosophy is simple: do your web development using Django just like you’re used to, then turn the knob and – with no additional effort – expose your project on the semantic web.

Django-RDF can expose models from any other app as RDF data. This makes it easy to write new views that return RDF/XML data, and/or query existing models in terms of RDFS or OWL classes and properties using (a variant of) the SPARQL query language. SPARQL in, RDF/XML out – two basic semantic web necessities. Django-RDF also implements an RDF store using its internal models such as Concept, Predicate, Resource, Statement, Literal, Ontology, Namespace, etc. The SPARQL query engine returns query sets that can freely mix data in the RDF store with data from existing Django models.

The major downside of this library is that it doesn’t seem to be maintained anymore; the last release is from 2008, and there seem to be various conflicts with recent versions of Django. A real shame!

Djubby http://code.google.com/p/djubby/

Djubby (download | docs) is a Linked Data frontend for SPARQL endpoints for the Django Web framework, adding a Linked Data interface to any existing SPARQL-capable triple stores.

Djubby is quite inspired by Richard Cyganiak’s Pubby (written in Java): it provides a Linked Data interface to local or remote SPARQL protocol servers, it provides dereferenceable URIs by rewriting URIs found in the SPARQL-exposed dataset into the djubby server’s namespace, and it provides a simple HTML interface showing the data available about each resource, taking care of handling 303 redirects and content negotiation.

Redland http://librdf.org/

Redland (download | docs | discussiongroup) is an RDF library written in C and including several high-level language APIs providing RDF manipulation and storage. Redland makes available also a Python interface (intro | apidocs) that can be used to manipulate RDF triples.

This library seems to be quite complete and is actively maintained; only potential downside is the installation process. In order to use the python bindings you need to install the C library too (which in turns depends on other C libraries), so (depending on your programming experience and operating system used) just getting up and running might become a challenge.

SuRF http://packages.python.org/SuRF/

SuRF (install | docs) is an Object – RDF Mapper based on the RDFLIB python library. It exposes the RDF triple sets as sets of resources and seamlessly integrates them into the Object Oriented paradigm of python in a similar manner as ActiveRDF does for ruby.

Other smaller (but possibly useful) python libraries for rdf:

Sparql Interface to python: a minimalistic solution for querying sparql endpoints using python (download | apidocs). UPDATE: Ivan Herman pointed out that this library has been discontinued and merged with the ‘SPARQL Endpoint interface to Python’ below.

SPARQL Endpoint interface to Python another little utility for talking to a SPARQL endpoint, including having select-results mapped to rdflib terms or returned in JSON format (download)

PySparql: again, a minimal library that does SELECT and ASK queries on an endpoint which implements the HTTP (GET or POST) bindings of the SPARQL Protocol (code page)

Sparta: Sparts is a simple, resource-centric API for RDF graphs, built on top of RDFLIB.

Oort: another Python toolkit for accessing RDF graphs as plain objects, based on RDFLIB. The project homepage hasn’t been updated for a while, although there is trace of recent activity on its google project page.

2. RDF Triplestores that are python-friendly

An important component of a linked-data application is the triplestore (that is, an RDF database): many commercial and non-commercial triplestores are available, but only a few offer out-of-the-box python interfaces. Here’s a list of them:

Allegro Graph http://www.franz.com/agraph/allegrograph/

AllegroGraph RDFStore is a high-performance, persistent RDF graph database. AllegroGraph uses disk-based storage, enabling it to scale to billions of triples while maintaining superior performance. Unfortunately, the official version of AllegroGraph is not free, but it is possible to get a free version of it (it limits the DB to 50 million triples, so although useful for testing or development it doesn’t seem a good solution for a production environment).

The Allegro Graph Python API (download | docs | reference) offers convenient and efficient access to an AllegroGraph server from a Python-based application. This API provides methods for creating, querying and maintaining RDF data, and for managing the stored triples.

A hands-on overview of what’s like to work with AllegroGraph and python can be found here: Getting started with AllegroGraph.

Open Link Virtuoso http://virtuoso.openlinksw.com/

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional RDBMS, ORDBMS, virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a “universal server”; it enables a single multithreaded server process that implements multiple protocols. The open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso.

Virtuoso from Python is intended to be a collection of modules for interacting with OpenLink Virtuoso from python. The goal is to provide drivers for `SQLAlchemy` and `RDFLib`. The package is installable from the Python Package Index and source code for development is available in a mercurial repository on BitBucket.

A possibly useful example of using Virtuoso from python: SPARQL Guide for Python Developer.

Sesame http://www.openrdf.org/

Sesame is an open-source framework for querying and analyzing RDF data (download | documentation). Sesame supports two query languages: SeRQL and Sparql. Sesame’s API differs from comparable solutions in that it offers a (stackable) interface through wich functionality can be added, and the storage engine is abstracted from the query interface (many other Triplestores can in fact be used through the Sesame API).

It looks as if the best way to interact with Sesame is by using Java; however there is also a pythonic API called pySesame. This is essentially a python wrapper for Sesame’s REST HTTP API, so the range of operations supported (Log in, Log out, Request a list of available repositories, Evaluate a SeRQL-select, RQL or RDQL query, Extract/upload/remove RDF from a repository) are somehow limited (for example, there does not seem to be any native SPARQL support).

A nice introduction to using Sesame with Python (without pySesame though) can be found in this article: Getting Started with RDF and SPARQL Using Sesame and Python.

Talis platform http://www.talis.com/platform/

The Talis Platform (faq | docs)is an environment for building next generation applications and services based on Semantic Web technologies. It is a hosted system which provides an efficient, robust storage infrastructure. Both arbitrary documents and RDF-based semantic content are supported, with sophisticated query, indexing and search features. Data uploaded on the Talis platform are organized into stores: a store is a grouping of related data and metadata. For convenience each store is assigned one or more owners who are the people who have rights to configure the access controls over that data and metadata. Each store provides a uniform REST interface to the data and metadata it manages.

Stores don’t come free of charge, but through the Talis Connected Commons scheme it is possible have quite large amounts of store space for free. The scheme is intended to support a wide range of different forms of data publishing. For example scientific researchers seeking to share their research data; dissemination of public domain data from a variety of different charitable, public sector or volunteer organizations; open data enthusiasts compiling data sets to be shared with the web community.

Good news for pythonistas too: pynappl is a simple client library for the Talis Platform. It relies on rdflib 3.0 and draws inspiration from other similar client libraries. Currently it is focussed mainly on managing data loading and manipulation of Talis Platform stores (this blog post says more about it).

Before trying out the Talis platform you might find useful this blog post: Publishing Linked Data on the Talis Platform.

4store http://4store.org/

4store (download | features | docs) is a database storage and query engine that holds RDF data. It has been used by Garlik as their primary RDF platform for three years, and has proved itself to be robust and secure.
4store’s main strengths are its performance, scalability and stability. It does not provide many features over and above RDF storage and SPARQL queries, but if your are looking for a scalable, secure, fast and efficient RDF store, then 4store should be on your shortlist.

4store offers a number of client libraries, among them there are two for python: first, HTTP4Store is a client for the 4Store httpd service – allowing for easy handling of sparql results, and adding, appending and deleting graphs. Second, py4s, although this seems to be a much more experimental library (geared towards multi process queries).
Furthemore, there is also an application for the Django web framework called django-4store that makes it easier to query and load rdf data into 4store when running Django. The application offers some support for constructing sparql-based Django views.

This blog post shows how to install 4store: Getting Started with RDF and SPARQL Using 4store and RDF.rb .

End of the survey.. have I missed out on something? Please let me know if I did – I’ll try to keep adding stuff to this list as I move on with my project work!

Python links (and more) 7/2/11

mikele — Thu, 03 Feb 2011 15:23:21 +0000

This post contains just a collection of various interesting things I ran into in the last couple of weeks… they’re organized into three categories: pythonic links, events and conferences, and new online tools. Hope you’ll find something of interest!

Pythonic stuff:

Epidoc
Epydoc is a handy tool for generating API documentation for Python modules, based on their docstrings. For an example of epydoc’s output, see the API documentation for epydoc itself (html, pdf).

PyEnchant
PyEnchant is a spellchecking library for Python, based on the excellent Enchant library.

Dexml
The dexml module takes the mapping between XML tags and Python objects and lets you capture that as cleanly as possible. Loosely inspired by Django’s ORM, you write simple class definitions to define the expected structure of your XML document.

SpecGen
SpecGen v5, ontology specification generator tool. It’s written in Python using Redland RDF library and licensed under the MIT license.

PyCloud
Leverage the power of the cloud with only 3 lines of python code. Run long processes on the cloud directly from your shell!

commandlinefu.com
This is not really pythonic – but nonetheless useful to pythonists: a community-based repository of useful unix shell scripts!

Events and Conferences:

Digital Resources in the Humanities and Arts Conference 2011
University of Nottingham Ningbo, China. The DRHA 2011 conference theme this year is “Connected Communities: global or local2local?”

Narrative and Hypertext Workshop at the ACM Hypertext 2011 conference in Eindhoven.

Culture Hack Day, London, January 2011
This event aimed at bringing cultural organisations together with software developers and creative technologists to make interesting new things.

History Hack Day, London, January 2011
A bunch of hackers with a passion for history getting together and doing experimental stuff

Conference.archimuse.com
The ‘online space for cultural informatics‘: lots of useful info here, about publications, jobs, people etc.

Agora project: Scholarly Open Access Research in European Philosophy
Project looking at building an infrastructure for the semantic interlinking of European philosophy datasets

Online tools:

FactForge
A web application aiming at showcasing a ‘practical approach for reasoning with the web of linked data’.

Semantic Overflow
A clone of Stack Overflow (collaboratively edited question and answer site for programmers) for questions ‘about semantic web techniques and technologies’.

Google Refine
A tool for “working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases”.

Google Scribe
A text editor with embedded autocomplete suggestions as you type

Books Ngram Viewer
Tool that displays statistical information regarding the use of user-selected sentences in a corpus of books (e.g., “British English”, “English Fiction”, “French”) over the selected years

…

How semantic is the semantic web?

mikele — Sun, 13 Jan 2008 17:09:54 +0000

Just read this article thanks to a colleague: I share pretty much everything it says about the SW, so I though it wouldn’t be too bad to pass it on to the next reader. Basically, it is about some very fundamental issues: what do we mean by semantics? Does a computer have semantics? If not, what’s the point of the name ‘Semantic Web’? I think that it’s quite un-controversial the fact that the choice of the name ‘semantic’ web is controversial.

I guess that many of the people originally supporting the SW vision didn’t really have the time to worry about this sort of questions, as they had different background, or maybe were just so excited about the grandiose idea an intelligent world wide web interconnected at the data level. Quite understandable, but as the idea is now reaching out to the larger public and maybe connecting to the more bottom-up Web2.0 movement, I think that it’d be great to re-think the foundations of the initial vision. Also with some rigorous clarification about the terms we use. The article of Chiara Carlino reaches an interesting conclusion:

So-called semantic web technologies provide the machine with data, like chinese symbols, and with a detailed set of instructions for handling them, in the form of ontologies. The computer simply follows the instructions, as the person in the chinese room does, and returns us useful informations, avoiding us the task of processing a big set of data on our own. These technologies have in fact nothing to do with semantics, because they never refer to anything in the real world: they never have any meaning, except in the mind of those expressing their knowledge in a machine-readable language, the mind of those preparing chinese symbols for the person in the chinese room. The person in the room â€“ the machine â€“ never ever gets this meaning. Such technologies, eventually, deal not much with semantics, but with knowledge, and its automatic processing through informatics. It seems therefore misleading and unfitting to keep on pointing with the word semantic a not semantic at all technology. It looks quite necessary to find out a new term, capable of hitting the core of this technology without giving rise to misunderstandings.

The article was also posted on the w3c SW mailing list some time ago, and generated an interesting discussion. But then, if we have to throw away the overrated ‘semantic web’ term, how should we call it instead? Without any doubt, this research strand has generated lots of interesting results, both theoretical and practical. Mmm maybe, mainly practical – see the many prototypes, ontologies and standards for manipulating ‘knowledge’. So, continues the author, what people are doing is not really dealing with ‘semantics’, but building very complex systems and infrastructures for dealing with ‘knowledge structures’:

There is a word who seems to serve this purpose, and that is epistematics. Its root â€“ episteÌme â€“ points out its strict connection with knowledge; nonetheless, it is not a theoric study, not an epistemology: it is rather an automatic processing of knowledge. The term informatic has been created to point out the automatic processing of informations: similarly the term epistematic is pretty fitting in pointing out the automatic processing of knowledge that the technologies we are speaking about make possible. The terms also reminds informatics, and this is pretty fitting as well, as this processing happens thanks to informatics. Eventually, the current â€“ though not much used â€“ meaning of epistematic is perfectly coherent with the technologies weâ€d like to point out with it: epistematic, in fact, means deductive, and one of the most advanced features of these technologies is exactly the chance to process knowledge deductively, using automatic reasoners who build into software the deductive rules of formal logics. The formerly so-called semantic web looks now like a new science, not bounded (and narrowed) anymore to the world of web, as the semantic web term suggested: epistematics is a real evolution of informatics, evolving from raw informations processing to structured knowledge processing. Epistematic technologies are those technologies allowing the automatic processing, performed through informatic instruments, of knowledge, expressed in a machine- accessible language, so that the machine can process it, according to a subset of first order logic rules, and thus extract new knowledge.

I like the term epistematics – and even more I like the fact that the ‘web’ is just a possible extension to it, not a core part of its meaning. Semantic technologies, based on various groundbreaking works the AI pioneers did some twenty or thirty years ago (mainly, in knowledge representation), have been used much before the web. Now, is the advent of the web making such a big difference to them? They used to write knowledge-based systems in KIF – now they do them in OWL – we change the language but aren’t the functionalities we are looking for the same? They used to harvest big companies’ databases and intranets to build up a knowledge base – now we also harvest the web – is that enough to claim the emergence of a new science, with new problems and methods? Or is it maybe just a different application of a well-known technology?

I must confess, the more I think about such issues, the more I feel they’re difficult and intricate. For sure the web is evolving fast – and the amount of available structured information is evolving fast too. Making sense of all this requires a huge amount of clarity of thought. And presumably, this clarity of thought will eventually drive to some clarity of expression. Wittgenstein wasn’t the first one claiming it, but for sure he did it well: language plays tricks on us. Better, with his words:

Philosophy is a battle against the bewitchment of our intelligence by means of language.

DBpedia rocks

mikele — Wed, 12 Sep 2007 09:39:24 +0000

It’s not the only semweb repository out there, but for sure it’s the more interesting. The whole wikipedia has been translated into RDF and made queryable through SPARQL.. lots of potential mashups waiting to be discovered! At the moment i’m looking at integrating the philosophy KB i’ve created with information from there… but I hope there’ll be time to experiment too…