The Nature ontologies portal is new section of the nature.com site that describes our involvement with semantic technologies and also makes available to the wider public several models and datasets as RDF linked data.
We launched the portal nearly a month ago, to the purpose of sharing our experiences with semantic technologies and more generally to contribute to the wider linked data community with our data models and datasets.
This April 2015 release doubles the number and size of our published data models. This now spans more completely the various things that our world contains, from publication things – articles, figures, etc. – to classification things – article-types, subjects, etc. – and additional things used to manage our content publishing operation – assets, events, etc. Also included is a release page for the latest data release and a separate page for archival data releases.
Is this the first time you’ve heard about semantic web and ontologies?
Then you should know that even though internally at Macmillan Science and Education XML remains the main technology used to represent and store the things we publish, the metadata about these documents (e.g. publication details, subject categories etc..) are normally encoded also using a more abstract, graph-oriented information model.
This is called RDF and has two key characteristics:
– it encodes all information in the form of triples e.g. <subject><predicate><object>
– it was built with the web in mind: broadly speaking, each of the items in a triple can be accessed via the internet i.e. it is a URIs (a generalised notion of a URL).
So why using RDF?
The RDF model makes it easier to maintain a shared yet scalable schema (aka an ‘ontology’) of the data types in use within our organization . A bit like a common language which is spoken by increasingly more data stores and thus allows to join things up more easily whenever needed.
At the same time – since the RDF model is native to the web – it facilitates the ‘semantic’ integration of our data with the increasing number of other organisations that publish their data using compatible models.
For example the BBC, Elsevier or more recently Springer are among the many organisations that contribute to the Linked Data Cloud.
We’ll continue improving these ontologies and releasing new ones as they are created. But probably most interestingly for many people, we’re working a new release of the whole NPG articles dataset (~1M articles).
So stay tuned for more!