Semantic Web – Parerga und Paralipomena

SN SciGraph Latest Release: Patents, Clinical Trials and many new features

mikele — Fri, 22 Mar 2019 12:49:22 +0000

We are pleased to announce the third release of SN SciGraph Linked Open Data. SN SciGraph is Springer Nature’s Linked Data platform that collates information from across the research landscape, i.e. the things, documents, people, places and relations of importance to the science and scholarly domain.

This release includes a complete refactoring of the SN SciGraph data model. Following up on users feedback, we have simplified it using Schema.org and JSON-LD, so to make it easier to understand and consume the data also for non-linked data specialists.

This release includes two brand new datasets – Patents and Clinical Trials linked to Springer Nature publications – which have been made available by our partner Digital Science, and in particular the Dimensions team.

Highlights:

New Datasets. Data about clinical trials and patents connected to Springer Nature publications have been added. This data is sourced from Dimensions.ai.
New Ontology. Schema.org is now the main model used to represent SN SciGraph data.
References data. Publications data now include references as well (= outgoing citations).
Simpler Identifiers. URIs for SciGraph objects have been dramatically simplified, reusing common identifiers whenever possible. In particular all articles and chapters use the URI format prefix (‘pub.’) + DOI (eg pub.10.1007/s11199-007-9209-1).
JSON-LD. JSON-LD is now the primary serialization format used by SN SciGraph.
Downloads. Data dumps are now managed externally on FigShare and are referenceable via DOIs.
Continuous updates. New publications data is released on a daily basis. All the other datasets are refreshed on a monthly basis.

Note: crossposted on https://researchdata.springernature.com

Ontospy 1.9.8 released

mikele — Thu, 03 Jan 2019 11:55:14 +0000

Ontospy version 1.9.8 has been just released and it contains tons of improvements and new features. Ontospy is a lightweight open-source Python library and command line tool for working with vocabularies encoded in the RDF family of languages.

Over the past month I’ve been working on a new version of Ontospy, which is now available for download on Pypi.

What’s new in this version

The library to generate ontology documentation (as html or markdown) is now included within the main Ontospy distribution. Previously this library was distributed separately under the name ontodocs. The main problem with this approach is that keeping the two projects in sync was becoming too time-consuming for me, so I’ve decided to merge them. NOTE one can still choose whether or not to include this extra library when installing.

You can print out the raw RDF data being returned via command line argument.

One can decided whether or not to include ‘inferred’ schema definitions extracted from an RDF payload. The inferences are pretty basic for now (eg the object of rdf:type statements is taken to be a type) but this allows for example to quickly dereference a DBpedia URI and pull out all types/predicates being used.

The online documentation are now hosted on github pages and available within the /docs folder of the project.

Improved support for JSON-LD and a new utility for quickly sending JSON-LD data to the online playground tool.

Several other bug fixes and improvements, in particular to the interactive ontology exploration mode (shell command), the visualization library (new visualizations are available – albeit still in alpha state).

Exploring scholarly publications using DBPedia concepts: an experiment

mikele — Fri, 23 Nov 2018 17:18:48 +0000

This post is about a recent prototype I developed, which allows to explore a sample collection of Springer Nature publications using subject tags automatically extracted from DBPedia.

DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web.

Datasets

The dataset I used is the result of a collaboration with Beyza Yaman, a researcher working with the DBpedia team in Leipzig, who used the SciGraph datasets as input to the DBPedia-Spotlight entity-mining tool.

By using DBPedia-Spotlight we automatically associated DBpedia subjects terms to a subset of abstracts available in the SciGraph dataset (around 90k abstract from 2017 publications).

The prototype allows to search the Springer Nature publications using these subject terms.

Also, DBpedia subjects include definitions and semantic relationships (which we are currently not using, but one can imagine how they could be raw material for generating more thematic ‘pathways’).

Results: serendipitous discovery of scientific publications

The results are pretty encouraging: despite the fact that the concepts extracted sometimes are only marginally relevant (or not relevant at all), the breadth and depth of the DBpedia classification makes the interactive exploration quite interesting and serendipitous.

You can judge for yourself: the tool is available here: http://hacks2019.michelepasin.org/dbpedialinks

The purpose of this prototype is to evaluate the quality of the tagging and generate ideas for future applications. So any kind of feedback or ideas is very welcome!

We are working with Beyza to write up the results of this investigation as a research paper. The data and software is already freely available on github.

A couple of screenshots:

Eg see the topic ‘artificial intelligence‘

One can add more subjects to a search in order to ‘zoom in’ into a results set, eg by adding ‘China’ to the search:

Implementation details

Main webapp is using Python and Django. I’ve finally found a good excuse to upgrade to the latest release (2.1) so very proud of myself!
Bootstrap
d3.js and in particular I’ve learned a lot from this great force-directed layout example
The Springer Nature SciGraph dataset of scientific publications
The DBpedia Spotlight tool

SN SciGraph: latest website release make it easier to discover related content

mikele — Wed, 01 Aug 2018 08:26:19 +0000

The latest release of SN SciGraph Explorer website includes a number of new features that make it easier to navigate the scholarly knowledge graph and discover items of interest.

Graphs are essentially composed by two kinds of objects: nodes and edges. Nodes are like the stations in a train map, while edges are the links that connect the different stations.

Of course one wants to be able to move from station to station in any direction! Similarly in a graph one wants to be able to jump back and forth from node to node using any of the links provided. That’s the beauty of it!

Although the underlying data allowed for this, the SN SciGraph Explorer website wasn’t fully supporting this kind of navigation. So we’ve now started to add a number of ‘related objects’ sections that reveal these pathways more clearly.

For example, now it’s much easier to get to the organizations and grants an article relates to:

Or, for a book edition, to see its chapters and related organizations:

And much more.. Take a look at the site yourself to find out.

Finally, we improved the linked data visualization included in every page by adding distinctive icons to each object type – so to make it easier to understand the immediate network of an object at a glance. E.g. see this grant:

SN SciGraph is primarily about opening up new opportunities for open data and metadata enthusiasts who want to do more things with our content, so we hope that these additions will make discovering data items easier and more fun.

Any comments? We’d love to hear from you. Otherwise, thanks for reading and stay tuned for more updates.

PS: this blog was posted on the SN Research Data space too.

PySciGraph: simple API for accessing SN SciGraph content

mikele — Thu, 07 Jun 2018 17:09:14 +0000

PySciGraph is a small open source Python library that makes it easier to access data from Springer Nature SciGraph. It is available on Pypi and Github. I created it mainly because I wanted to be able to quickly check from the command line whether an object exists in SN SciGraph, or what metadata it returns. But of course this could be developed further e.g. so to allow to navigate the graph by following links from one object to the other.

What is SN SciGraph? SciGraph is the Springer Nature Linked Data platform that collates information from across the research landscape, i.e. the things, documents, people, places and relations of importance to the science and scholarly domain. Metadata for millions of entities are available to explore, as well as for downloading to reuse within your own application under a CC-BY and CC-BY-NC license (you can follow SN SciGraph blog posts here)

Here’s an example of how the library can be used from the command line:

# check if an object is on SciGraph via its URI
$ pyscigraph --uri http://www.grid.ac/institutes/grid.443610.4
Parsing 12 triples..
URI:  http://www.grid.ac/institutes/grid.443610.4
DOI:  N/A
Label:  Hakodate University
Title:  N/A
Types:  foaf:Organization grid:Education

# check if a publication is on SciGraph via its DOI
$ pyscigraph --doi 10.1038/171737a0
Parsing 251 triples..
URI:  http://scigraph.springernature.com/things/articles/f5ac1e9c7a520ca2a34cb13af4809bdd
DOI:  10.1038/171737a0
Label:  Article: Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid
Title:  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid
Types:  sg:Article

# retrieve all metadata via an RDF serialization
$ pyscigraph --doi 10.1038/171737a0 --rdf n3
Parsing 251 triples..
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix grid: <http://www.grid.ac/ontology/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sg: <http://scigraph.springernature.com/ontologies/core/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix vivo: <http://vivoweb.org/ontology/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://scigraph.springernature.com/things/articles/f5ac1e9c7a520ca2a34cb13af4809bdd> a sg:Article ;
    rdfs:label "Article: Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid" ;
    sg:coverDate "1953-04-25"^^xsd:date ;
    sg:coverYear "1953-01-01"^^xsd:gYear ;
    sg:coverYearMonth "1953-04-01"^^xsd:gYearMonth ;
    sg:ddsIdJournalBrand "41586" ;
    sg:doi "10.1038/171737a0" ;
    sg:doiLink <http://dx.doi.org/10.1038/171737a0> ;
    sg:hasArticleType <http://scigraph.springernature.com/things/article-types/af> ;
    sg:hasContributingOrganization <http://www.grid.ac/institutes/grid.5335.0> ;
    sg:hasContribution <http://scigraph.springernature.com/things/contributions/7325bd1cadf3a1cc253c611682bc62fd>,
        <http://scigraph.springernature.com/things/contributions/989a6a2607c882ffd99341144836d1fc> ;
    sg:hasFieldOfResearchCode <http://purl.org/au-research/vocabulary/anzsrc-for/2008/03>,
        <http://purl.org/au-research/vocabulary/anzsrc-for/2008/0306> ;
    sg:hasJournal <http://scigraph.springernature.com/things/journals/5ea8996a5bb089dd0562d3bfe24eaad9>,
        <http://scigraph.springernature.com/things/journals/723ba46cf7980ad6089b3da0ba4b0b47> ;
    sg:hasJournalBrand <http://scigraph.springernature.com/things/journal-brands/012496b06989edb434c6b8e1d0b0a7db> ;
    sg:issnElectronic "1476-4687" ;
    sg:issnPrint "0028-0836" ;
    sg:issue "4356" ;
    sg:license <http://scigraph.springernature.com/explorer/license/> ;
    sg:npgId "171737a0" ;
    sg:pageEnd "738" ;
    sg:pageStart "737" ;
    sg:publicationDate "1953-04-25"^^xsd:date ;
    sg:publicationYear "1953-01-01"^^xsd:gYear ;
    sg:publicationYearMonth "1953-04-01"^^xsd:gYearMonth ;
    sg:scigraphId "f5ac1e9c7a520ca2a34cb13af4809bdd" ;
    sg:title "Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid" ;
    sg:volume "171" .

The current release (0.4) just offers basic functionalities but I’m planning to do more work on this over the next months.

Any ideas? Comments? Please open an issue on Github!

SN SciGraph is part of the Linked Open Data Cloud 2018

mikele — Wed, 23 May 2018 11:28:59 +0000

The latest Linked Open Data (LOD) Cloud has been recently made available by the Insights Centre for Data Analytics. The LOD cloud is a visual representation of the datasets (and the links among them) that have been published according to the Linked Data principles – a web-friendly methodology for data sharing that encourages open schemas and data reuse.

We’ve very glad to say that Sn SciGraph is now part of it! (ps this is its JSON record) If you look at the picture above, the two red lines departing from our ‘bubble’ indicate that the two main datasets we are linking to are CrossRef and DBpedia.

Note this visualisation unfortunately doesn’t do justice to the fact that SN SciGraph is one of the largest datasets out there (1 billion + triples and counting). In previous versions, the bubble’s size would reflect how large a dataset is.. but hopefully that’ll change in the future!

The cloud currently contains 1,184 datasets with 15,993 links (as of April 2018) and it’s divided into 9 sub-clouds based on their domain.

SciGraph is part of the ‘Publications’ sub-cloud (depicted above) alongside other important linked data publishers such as the British Library, the German National Library, the Open Library, OCLC and many others.

It’s impressive to see the growing number of datasets being released using this approach! We’ve been told that later this year more discovery tools will be made available that allow searching for data publishers, so to make it easier for people and projects to collaborate.

Useful links:

The LOD Cloud: The Linked Open Data Cloud
The LOD publications cloud: http://lod-cloud.net/clouds/publications-lod.svg

SciGraph publishes 1 billion facts as Linked Open Data

mikele — Tue, 14 Nov 2017 01:50:27 +0000

Last Thursday we reached a major milestone for the SciGraph project: nearly 1 billion facts (= RDF statements) have been released as Linked Open Data, most of it under a CC-BY license!

This data release follows and improves on the previous data release (February 2017) which included metadata for all journal articles published in the last 5 years.

What’s in this release:

Datasets downloads. Almost 1 billion triples (23.2 GB compressed, or 205.2 GB uncompressed) comprising our SciGraph ontology, SKOS taxonomies and instance data covering the complete archive of Springer Nature publications, i.e. books and journals (1801-2017), conferences, affiliations, funders, research projects and grants. The data is current to end of 2017Q3.
Data Explorer. The data explorer allow users to visualize each single node in the graph and to move to other related nodes interactively. Furthermore, the Explorer allow users to get rich data descriptions for SciGraph things by traversing the knowledge graph and using content negotiation on SciGraph URLs. In other words, the Explorer is like a Linked Data API for developers: the RDF data is dereferenceable (Turtle, N-Triples, RDF/XML) and both HTTP and HTTPS protocols are supported.
Dual Licence. The majority of SciGraph data is being released under a Creative Commons Attribution (CC BY) 4.0 International License, with a small portion of the data (specifically abstracts and grants) separately licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 International License.
Model Mappings. To align the SciGraph ontology with other well-known vocabularies we include several mappings and have used extensively two external datasets: ANZSRC (Australian and New Zealand Standard Research Classification) Fields of Research codes, and GRID (Global Research Identifier Database) identifiers.

Who is this for?

In general, for people who are interested in reusing our metadata e.g. for data analysis tasks, for developing applications that benefit from linking to Springer Nature content etc.. For example:

* Researchers and (linked) open data enthusiasts i.e. see the Linked Data Cloud.
* Metadata and information specialists e.g. librarians.
* Developers and Data Scientists.

Furthermore, we are in contact with various organisations who are interested in reusing large parts of our datasets, e.g. Wikidata, DBpedia and EMBL-EBI.

Questions?

Any questions of feedback, leave a comment or email knowledge-graph@springernature.com.

We’d love to hear from you! Also, you can follow the #scigraph tag on twitter for last-minute news.

Exploring SciGraph data using JSON-LD, Elastic Search and Kibana

mikele — Thu, 06 Apr 2017 14:12:05 +0000

Hello there data lovers! In this post you can find some information on how to download and make some sense of the scholarly dataset recently made available by the Springer Nature SciGraph project, by using the freely available Elasticsearch suite of software.

A few weeks ago the SciGraph dataset was released (full disclosure: I’m part of the team who did that!). This is a high quality dataset containing metadata and abstracts about scientific articles published by Springer Nature, research grants related to them plus other classifications of this content.

This release of the dataset includes the last 5 years of content – that’s already an impressive 32 gigs of data you can get your hands on. So in this post I’m going to show how to do that, in particular by transforming the data from the RDF graph format they come with, into a JSON format which is more suited for application development and analytics.

We will be using two free-to-download products, GraphDB and Elasticsearch, so you’ll have to install them if you haven’t got them already. But no worries, that’s pretty straighforward, as you’ll see below.

1. Hello SciGraph Linked Data

First things first, we want to get hold of the SciGraph RDF datasets of course. That’s pretty easy, just head over to the SciGraph downloads page and get the following datasets:

Ontologies: the main schema behind SciGraph.
Articles – 2016: all the core articles metadata for one year.
Grants: grants metadata related to those articles.
Journals: full list of Springer Nature journal catalogue.
Subjects: classification of research areas developed by Springer Nature.

That’s pretty much everything, only thing we’re getting only one year worth of articles as that’s enough for the purpose of this exercise (~300k articles from 2016).

Next up, we want to get a couple of other datasets SciGraph depends on:

GRID: a catalogue of the world’s research organisations. Make sure you get both the ontology and one of the latest releases, within which you can find an RDF implementation too.
Field Of Research codes: another classification scheme used in SciGraph, developed by the Australian and New Zealand Standard Research Classification organization.

That’s it! Time for a cup of coffee.

2. Python to the help

We will be doing a bit of data manipulation in the next sections and Python is a great language for that sort of thing. Here’s what we need to get going:

Python. Make sure you have Python installed and also Pip, the Python package manager (any Python version above 2.7 should be ok).
GitHub project. I’ve created a few scripts for this tutorial, so head over to the hello-scigraph project on GitHub and download it to your computer. Note: the project contains all the Python scripts needed to complete this tutorial, but of course you should feel free to modify them or write from scratch if you fancy it!
Libraries. Install all the dependencies for the hello-scigraph project to run. You can do that by cd-ing into the project folder and running pip install -r requirements.txt (ideally within a virtual environment, but that’s up to you).

3. Loading the data into GraphDB

So, you should have by now 8 different files containing data (after step 1 above). Make sure they’re all in the same folder and that all of them have been unzipped (if needed), then head over to the GraphDB website and download the free version of the triplestore (you may have to sign up first).

The online documentation for GraphDB is pretty good, so it should be easy to get it up and running. In essence, you have to do the following steps:

Launch the application: for me, on a mac, I just had to double click the GraphDB icon – nice!
Create a new repository: this is the equivalent of a database within the triplestore. Call this repo “scigraph-2016” so that we’re all synced for the following steps.

Next thing, we want a script to load our RDF files into this empty repository. So cd into the directory containg the GitHub project (from step 2) and run the following command:

python -m hello-scigraph.loadGraphDB ~/scigraph-downloads/

The “loadGraphDB” script goes through all RDF files in the “scigraph-downloads” directory and loads them into the scigraph-2016 repository (note: you must replace “scigraph-downloads” with the actual path to the folder you downloaded content in step 1 above).

So, to recap: this script is now loading more than 35 million triples into your local graph database. Don’t be surprised if it’ll take some time (in particular the ‘articles-2016’ dataset, by far the biggest) so it’s time to take a break or do something else.

Once the process it’s finished, you should be able to explore your data via the GraphDB workbench. It’ll look something like this:

4. Creating an Elasticsearch index

We’re almost there. Let’s head over to the Elasticsearch website and download it. Elasticsearch is a powerful, distributed, JSON-based search and analytics engine so we’ll be using it to build an analytics dashboard for the SciGraph data.

Make sure Elastic is running (run bin/elasticsearch (or bin\elasticsearch.bat on Windows), then cd into the hello-scigraph Python project (from step 2) in order to run the following script:

python -m hello-scigraph.loadElastic

If you take a look at the source code, you’ll see that the script does the following:

Articles loading: extracts articles references from GraphDB in batches of 200.
Articles metadata extraction: for each article, we pull out all relevant metadata (e.g. title, DOI, authors) plus related information (e.g. author GRID organizations, geo locations, funding info etc..).
Articles metadata simplification: some intermediate nodes coming from the orginal RDF graph are dropped and replaced with a flatter structure which uses a a temporary dummy schema (prefix es: It doesn’t matter what we call that schema, but what’s important is to that we want to simplify the data we put into the Elastic search index. That’s because while the Graph layer is supposed to facilitate data integration and hence it benefits from a rich semantic representation of information, the search layer is more geared towards performance and retrieval hence a leaner information structure can dramatically speed things up there.
JSON-LD transformation: the simplified RDF data structure is serialized as JSON-LD – one of the many serializations available for RDF. JSON-LD is of course valid JSON, meaning that we can put that into Elastic right away. This is a bit of a shortcut actually, in fact for a more fine-grained control of how the JSON looks like, it’s probably better to transform the data into JSON using some ad-hoc mechanism. But for the purpose of this tutorial it’s more than enough.
Elastic index creation. Finally, we can load the data into an Elastic index called – guess what – “hello-scigraph”.

Two more things to point out:

Long queries. The Python script enforces a 60 seconds time-out on the GraphDB queries, so in case things go wrong with some articles data the script should keep running.
Memory issues. The script stops for 10 seconds after each batch of 200 articles (time.sleep(10)). Had to do this to prevent GraphDB on my laptop from running out of memory. Time to catch some breath!

That’s it! Time for another break now. A pretty long one actually – loading all the data took around 10 hours on my (rather averaged spec’ed) laptop so you may want to do that overnight or get hold of a faster machine/server.

Eventually, once the loading script is finished, you can issue this command from the command line to see how much data you’ve loaded into the Elastic index “hello-scigraph”. Bravo!

curl -XGET 'localhost:9200/_cat/indices/'

5. Analyzing the data with Kibana

Loading the data in Elastic already opens up a number of possibilites – check out the search APIs for some ideas – however there’s an even quicker way to analyze the data: Kibana. Kibana is another free product in the Elastic Search suite, which provides an extensible user interface for configuring and managing all aspects of the Elastic Stack.

So let’s get started with Kibana: download it and set it up using the online instructions, then point your browser at http://localhost:5601 .

You’ll get to the Kibana dashboard which shows the index we just created. Here you can perform any kind of searches and see the raw data as JSON.

What’s even more interesting is the visualization tab. Results of searches can be rendered as line chart, pie charts etc.. and more dimensions can be added via ‘buckets’. See below for some quick examples, but really, the possibilities are endless!

Conclusion

This post should have given you enough to realise that:

The SciGraph dataset contain an impressive amount of high-quality scholarly publications metadata which can be used for things like literature search, research statistics etc..
Even though you’re not familiar with Linked Data and the RDF family of languages, it’s not hard to get going with a triplestore and then transform the data into a more widely used format like JSON.
Finally, Elasticsearch and especially Kibana are fantastic tools for data analysis and exploration! Needless to say, in this post I’ve just scratched the surface of what could be done with it.

Hope this was fun, any questions or comments, you know the drill :-)

OntoSpy v.1.7.4

mikele — Mon, 27 Feb 2017 07:59:52 +0000

A new version of OntoSpy (1.7.4) is available online. OntoSpy is a lightweight Python library and command line tool for inspecting and visualising vocabularies encoded in the RDF family of languages.

This version includes a hugely improved API for creating nice-looking HTML or Markdown documentation for an ontology, which takes advantage of frameworks like Bootstrap and Bootswatch.

You can take a look at the examples page to see what I’m taking about.

To find out more about Ontospy:

CheeseShop: https://pypi.python.org/pypi/ontospy

Github: https://github.com/lambdamusic/ontospy

 
Here’s a short video showing a typical sessions with the OntoSpy repl:

Coming up next

More advanced ontology visualisations using d3 or similar javascript libraries;

A better separation between the core Python library in OntoSpy and the other components. This is partly addressing the fact that the OntoSpy package has grown a bit too much, in particular form the point of view of people who are only interested in using it in order to create their own applications, as opposed (for example) to reusing the built-in visualisations.

Of course, any comments or suggestions are welcome as usual – either using the form below or via GitHub. Cheers!

Leipzig Semantics 2016 conference

mikele — Tue, 25 Oct 2016 16:01:51 +0000

A few weeks ago I attended the Semantics conference in Leipzig, so here’s a short report about the event.

SEMANTiCS 2016 (#semanticsconf) continues a long tradition of bringing together colleagues from around the world to present best practices, panels, papers and posters to discuss semantic systems in birds-of-a-feather sessions and informal settings.

What I really liked about this event is the fact that it is primarily industry-focused, meaning that most (if not all) of the talks were dealing with pragmatic aspects of real-world applications of semantic technologies. You can take a look at the online proceedings for more details, alternatively there are some nice videos and pictures pages too.

I meant to share some notes a few weeks ago already but never got round to doing it… so here are a few highlights:

Springer Nature’s Scigraph project got quite a bit of publicity as I was one of the invited keynote speakers. Overall, the feedback was extremely positive and it seems that many people are waiting to see more from us in the coming months. We also chatted to representatives from other publishers (Elsevier, Wolfer Kluwers, Oxford Uni Press) about areas where we could collaborate more e.g. constructing shared datasets (eg conference identifiers, coordinated by CrossRef the same way they do it for Funders).

Cathy Dolbear from Oxford University Press gave an interesting keynote describing the work they’ve been doing with Linked Data, mostly focusing on the Oxford Global Languages project, which links lexical information from multiple global and also digitally under-represented languages in a semantic graph. Also, she talked about creating rich schema.org snippets so to better interface with Google’s knowledge graph and thus increasing their ranking in search results. That was really good to hear as we’re investing in this area too!

David Kuilman from Elsevier talked about their approach to content management based on semantic technologies. David’s team has been focusing on tracking document production metadata mainly before publication (eg submission and production workflow metadata) which is quite interesting cause it’s the exact opposite of what we’ve been doing at Springer Nature.