We're getting closer to releasing the full set of metadata covering over one million articles published by Nature Publishing Group since 1845. So here's a sneak peek at this dataset, in the form of a simple D3.js visual summary of what will soon be available to download and reuse.
In recent months, I've been working with my colleagues at Macmillan Science and Education on an open data portal that makes available to the public many of the taxonomies and ontologies we use internally for organizing the content we publish.
This is part of our ongoing involvement with linked data and semantic technologies, aimed both at leveraging these tools to the end of transforming the publishing workflow into a more dynamic platform, and at contributing to the evolving web of open data with a rich dataset of scientific articles metadata.
The articles dataset includes metadata about all articles published by the Nature journal, of course. But it's not limited to that: Scientific American, Nature Medicine, Nature Genetics, and many other titles are also included (note: the full list can be downloaded as raw data here).
The first diagram shows how many articles have been published each year since 1845 (the start year of Scientific American). Nature began only a few years later in 1869. The curve getting steeper in the 1990s corresponds to the exponential increase in publications due to the progressive specialization of scientific journals (e.g., all the Nature-branded titles).
The second diagram shows the increase in publication volumes on an incremental scale. We've now reached 1 million articles and counting!
In order to create the charts I played around with a nifty example from Mike Bostock (http://bl.ocks.org/mbostock/3902569) and added a couple of extra things to it.
The full source code is on Github.
Finally, it's worth mentioning that this metadata had already been made available a few years ago under the CC0 license: you can still access it here. This upcoming release, however, makes it available in the context of a much more precise and stable set of ontologies, meaning that the semantics of the dataset are more clearly laid out and consistent.
So stay tuned for more! And if you plan to or would like to reuse these datasets, please do get in touch, either here or by emailing developers@nature.com.
Cite this blog post:
Comments via Github:
2015
paper Learning how to become a linked data publisher: the nature.com ontologies portal.
5th Workshop on Linked Science 2015, colocated with ISWC 2015., Bethlehem, USA, Sep 2015.