Just Blogging – Parerga und Paralipomena http://www.michelepasin.org/blog At the core of all well-founded belief lies belief that is unfounded - Wittgenstein Wed, 10 Feb 2021 17:55:36 +0000 en-US hourly 1 https://wordpress.org/?v=5.2.11 13825966 ‘The Kryos Noise’ is available on Spotify http://www.michelepasin.org/blog/2021/01/22/the-kryos-noise-is-available-on-spotify/ Fri, 22 Jan 2021 17:38:52 +0000 http://www.michelepasin.org/blog/?p=3439 The prog rock album I’ve worked on years ago with the band Kryos Project is now available also on Spotify (and Amazon too).

Why? Well it just feels good to be able to open up Spotify and listen to your own music. This is stuff we’ve made almost 20 years ago (!) but it still feels kinda relevant. Fresh. Well.. you know what I mean.

I used Distrokid to handle the distribution to all major services. It’s a bit fiddly to work with and the overall UX is rather barebones, but it does what is supposed to for a pretty damn honest price!

]]>
3439
Zero Hunger Hack Day: surfacing research about the Sustainable Development Goals program http://www.michelepasin.org/blog/2019/02/11/zero-hunger-hack-day/ Mon, 11 Feb 2019 12:09:58 +0000 http://www.michelepasin.org/blog/?p=3282 This post is about a little dashboard idea that aims at helping policy makers discover research relevant to the ‘zero hunger‘ topic, one of the themes of the Sustainable Development Goals program.

The 2030 Agenda for Sustainable Development, adopted by all United Nations Member States in 2015, provides a shared blueprint for peace and prosperity for people and the planet, now and into the future. At its heart are the 17 Sustainable Development Goals (SDGs), which are an urgent call for action by all countries – developed and developing – in a global partnership. They recognize that ending poverty and other deprivations must go hand-in-hand with strategies that improve health and education, reduce inequality, and spur economic growth – all while tackling climate change and working to preserve our oceans and forests.

For more background about this project, see also its wikipedia page https://en.wikipedia.org/wiki/Sustainable_Development_Goals

Screenshot 2019-02-19 at 21.18.42.png

Springer Nature is among the many organizations who are taking an active role in developing scenarios and solutions to tackle these global challenges. A couple of months ago Springer Nature organized a hack day which brought together people with different backgrounds and expertise in order to come up with ideas and prototypes that could lead to further research. In particular, the focus of the hack day was on the ‘zero hunger’ theme.

The team I was working with developed a concept around the idea of an easy-to-use dashboard-like tool which could be used by busy policy makers in order to quickly gather infos about researchers or institutions they’d want to consult with.

Screenshot 2019-02-19 at 21.46.02.jpg

In order to make this idea more tangible I ended up building a little prototype, which allows to scan scholarly documents in order to pull out information (potentially) related to the ‘zero hunger’ topic and sub-topics, essentially following the keywords-structure specified in the Sustainable Development Goals document.

The prototype is available here: http://hacks2019.michelepasin.org/zerohunger/

 

Screen Shot 2018-11-02 at 16.22.42

 

Screen Shot 2018-11-02 at 16.22.51

 

This experiment also gave me an opportunity to learn about the Dimensions.ai API, a domain specific language (DSL) which allows to query the Dimensions database, a state-of-the-art scholarly platform containing  millions of linked metadata records about publications, grants, patents, clinical trials and policy documents (for more background about Dimensions, see this blog post and this white paper).

Screenshot 2019-02-19 at 21.50.33.jpg

The API itself is being a paywall, but if you are curious about it, the documentation is available online.

It’s a fantastic resource, intuitive and easy to use yet powerful and features-rich, so I am pretty sure I’ll be writing more about it.

Stay tuned for more!

 

 

]]>
3282
Interesting read: ‘SciSci’ i.e. the science of science http://www.michelepasin.org/blog/2018/03/22/interesting-read-scisci-i-e-the-science-of-science/ Thu, 22 Mar 2018 11:33:06 +0000 http://www.michelepasin.org/blog/?p=3188 Albert-László Barabási is a Romanian-born Hungarian-American physicist, best known for his work in the research of network theory.

This article discusses the impact and methods of ‘science analytics’ that is the quantitative analysis of scientific outputs.

Full Article Available here: http://barabasi.com/f/939.pdf

The science of science (SciSci) offers a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales: It provides insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science.
[…]
For example, measurements indicate that scholars are risk-averse, preferring to study topics related to their current expertise, which constrains the potential of future discoveries. Those willing to break this pattern engage in riskier careers but become more likely to make major breakthroughs. Overall, the highest- impact science is grounded in conventional combinations of prior work but features unusual combinations.

barabasi

Related links

  • Papers by Barabási on Nature.com
  • Full text of Little Science Big Science, another seminal work in this field
  • A few Twitter reactions from @michael_nielsen
  •  

    ]]>
    3188
    Leipzig Semantics 2016 conference http://www.michelepasin.org/blog/2016/10/25/leipzig-semantics-2016-conference/ Tue, 25 Oct 2016 16:01:51 +0000 http://www.michelepasin.org/blog/?p=2816 A few weeks ago I attended the Semantics conference in Leipzig, so here’s a short report about the event.

    SEMANTiCS 2016 (#semanticsconf) continues a long tradition of bringing together colleagues from around the world to present best practices, panels, papers and posters to discuss semantic systems in birds-of-a-feather sessions and informal settings.

    What I really liked about this event is the fact that it is primarily industry-focused, meaning that most (if not all) of the talks were dealing with pragmatic aspects of real-world applications of semantic technologies. You can take a look at the online proceedings for more details, alternatively there are some nice videos and pictures pages too.

    I meant to share some notes a few weeks ago already but never got round to doing it… so here are a few highlights:

  • Springer Nature’s Scigraph project got quite a bit of publicity as I was one of the invited keynote speakers. Overall, the feedback was extremely positive and it seems that many people are waiting to see more from us in the coming months. We also chatted to representatives from other publishers (Elsevier, Wolfer Kluwers, Oxford Uni Press) about areas where we could collaborate more e.g. constructing shared datasets (eg conference identifiers, coordinated by CrossRef the same way they do it for Funders).
  •  

  • Cathy Dolbear from Oxford University Press gave an interesting keynote describing the work they’ve been doing with Linked Data, mostly focusing on the Oxford Global Languages project, which links lexical information from multiple global and also digitally under-represented languages in a semantic graph. Also, she talked about creating rich schema.org snippets so to better interface with Google’s knowledge graph and thus increasing their ranking in search results. That was really good to hear as we’re investing in this area too!
  • Screen+Shot+2016 10 25+at+16 28 26

    Screen+Shot+2016 10 25+at+16 27 06

     

  • David Kuilman from Elsevier talked about their approach to content management based on semantic technologies. David’s team has been focusing on tracking document production metadata mainly before publication (eg submission and production workflow metadata) which is quite interesting cause it’s the exact opposite of what we’ve been doing at Springer Nature.
  • Screen+Shot+2016 10 25+at+16 30 50

    Screen+Shot+2016 10 25+at+16 32 09

     

    ]]>
    2816
    Open Data Summit 2016 http://www.michelepasin.org/blog/2016/10/21/open-data-summit-2016/ Fri, 21 Oct 2016 16:21:17 +0000 http://www.michelepasin.org/blog/?p=2830 On November 1st we were invited to present the Scigraph project at the London ODI Summit, the annual event organized by the Open Data Institute to review and discuss the social and economic impact of open data in both the public and commercial sectors.

    IMG 7579

     

    If data infrastructure is as important to our infrastructure as roads, then the Open Data Institute is helping to lay the concrete. Join us on 1 November to hear inspiring stories from around the world on how people are innovating with the web of data, with presentations from diverse innovators – from startups to high-profile speakers such as Sir Tim Berners-Lee (creator of the World Wide Web), Sir Nigel Shadbolt (AI expert) and Martha Lane Fox (Lastminute.com founder).

    Our presentation was part of a a session titled How to design for open government and enterprise, which included two speakers from industry (me and Tharindi Hapuarachchi from Thomson Reuters Labs) and two from the public sector (Clare Moriarty from the Department for Environment, Food and Rural Affairs and Jamie Whyte from Trafford Council).

    IMG 7590

     

    Feedback was very positive, in particular the audience seem to have liked the long standing commitment Springer Nature towards making science more open.

    Screen+Shot+2016 11 21+at+11 29 14

     

    Other bits and pieces:

  • the open data awards from this year include various interesting projects and are worth taking a look at;
  • Tim Berners Lee hinting at the potential of recent technical advances like blockchain technology and the Solid project;
  • The ODINE (Open Data Incubator Europe) session was very interesting, in fact I’ve learnt that there’s a search engine for the internet of things too!
  • Finally, some more pictures..

    IMG 7583

    IMG 7586

    IMG 7585

     

    ]]>
    2830
    SpotiSci: finding science concepts on Spotify http://www.michelepasin.org/blog/2016/04/29/spotisci-finding-science-concepts-on-spotify/ Fri, 29 Apr 2016 22:43:12 +0000 http://www.michelepasin.org/blog/?p=2776 Ever wondered how many musical albums focus on topics like the moon landing, artificial intelligence or DNA replication? Probably not for everyone’s taste, but if you give it a shot you’ll be surprised at the results.

    When I ran into the excellent Spotipy library (a small yet nifty Python client for the Spotify Web API) I couldn’t wait to try it out with some fun project.

    So that’s how the SpotiSci experiment came about; essentially a search tool that allows to query Nature.com‘s one million articles archives while at the same time browsing the vast selection of music available on Spotify.

    Have a good listen. You may find the right soundtrack for your science.

    Spotisci1

    Spotisci3

     

    ]]>
    2776
    Accessing OS X dictionary with Python http://www.michelepasin.org/blog/2015/11/28/accessing-os-x-dictionary-with-python/ Sat, 28 Nov 2015 15:57:06 +0000 http://www.michelepasin.org/blog/?p=2724 A little script that allows to access the OS X Dictionary app using Python.

    Tip: make the script executable and add an alias for it in order to be able to call it from the command line easily.

     

    ]]>
    2724
    Recent projects from CrossRef.org http://www.michelepasin.org/blog/2015/06/14/recent-projects-from-crossref-org/ Sun, 14 Jun 2015 22:21:55 +0000 http://www.michelepasin.org/blog/?p=2638 We spent the day with the CrossRef team in Oxford last week, talking about our recent work in the linked data space (see the nature ontologies portal) and their recent initiatives in the scholarly publishing area.

    So here’s a couple of interesting follow ups from the meeting.
    ps. If you want to know more about CrossRef, make sure you take a look at their website and in particular the labs section: http://labs.crossref.org/.

    Opening up article level metrics

    http://det.labs.crossref.org/

    CrossRef is using the open source Lagotto application (developed by PLOS https://github.com/articlemetrics/lagotto) to retrieve article metrics data from a variety of sources (e.g. wikipedia, twitter etc. see the full list here).

    The model used for storing this data follows an agreed ontology containing for example a classification of ‘mentions’ actions (viewed/saved/discussed/recommended/cited – see this paper for more details).

    In a nutshell, CrossRef is planning to collect and make the metrics (raw) data for all the DOIs they track in the form of ‘DOI events

    An interesting demo application shows the stream of DOIs citations coming from Wikipedia (one of the top referrers of DOIs, unsurprisingly). More discussions on this blog post.

    Screen Shot 2015 05 20 at 16 30 00 1024x760

    Linking dataset DOIs and publications DOIs

    http://www.crosscite.org/

    CrossRef has been working with Datacite to the goal of harmonising their databases. Datacite is the second major register of DOIs (after CrossRef) and it has been focusing on assigning persistent identifiers to datasets.

    This work is now gaining more momentum as Datacite is enlarging its team. So in theory it won’t be long before we see a service that allows to interlink publications and datasets, which is great news.

    Linking publications and funding sources

    http://www.crossref.org/fundref/

    FundRef provides a standard way to report funding sources for published scholarly research. This is increasingly becoming a fundamental requirement for all publicly funded research, so several publishers have agreed to help extracting funding information and sending it to CrossRef.

    A recent platform built on top of Fundref is Chorus http://www.chorusaccess.org/, which enables users to discover articles reporting on funded research. Furthermore it provides dashboards which can b used by funders, institutions, researchers, publishers, and the public for monitoring and tracking public-access compliance for articles reporting on funded research.

    For example see http://dashboard.chorusaccess.org/ahrq#/breakdown

    Screen Shot 2015 06 11 at 12 57 39

    Miscellaneous news & links

    JSON-LD (an RDF version of JSON) is being considered as a candidate data format for the next generation of the CrossRef REST API.

    – The prototype http://www.yamz.net/ came up in discussion; a quite interesting stack-overflow meets ontology-engineering kind of tool. Def worth a look, I’d say.

    Wikidata (a queryable structured data version of wikipedia) seems to be gaining a lot of momentum after it’s taken over Freebase from Google. Will it eventually replace its main rival DBpedia?

    Screen Shot 2015 06 11 at 12 58 20

     

    ]]>
    2638
    A sneak peek at Nature.com articles’ archive http://www.michelepasin.org/blog/2015/06/08/a-sneak-peek-at-nature-com-articles-archive/ http://www.michelepasin.org/blog/2015/06/08/a-sneak-peek-at-nature-com-articles-archive/#comments Mon, 08 Jun 2015 21:26:58 +0000 http://www.michelepasin.org/blog/?p=2632 We’re getting closer to releasing the full set of metadata covering over one million articles published by Nature Publishing Group since 1845. So here’s a sneak peek at this dataset, in the form of a simple d3.js visual summary of what soon will be available to download and reuse.

    In the last months I’ve been working with my colleagues at Macmillan Science and Education on an open data portal that makes available to the public many of the taxonomies and ontologies we use internally for organising the content we publish.

    This is part of our ongoing involvement with linked data and semantic technologies, aimed both at leveraging these tools to the end of transforming the publishing workflow into a more dynamic platform, and at contributing to the evolving web of open data with a rich dataset of scientific articles metadata.

    The articles dataset includes metadata about all articles published by the Nature journal, of course. But not only: the Scientific American, Nature Medicine, Nature Genetics and many other titles are also part of it (note: the full list can be downloaded as raw data here).

    Screen Shot 2015 06 08 at 22 24 15

    The first diagram shows how many articles have been published each year since 1845 (the start year of Scientific American). Nature began only a few years later in 1869; the curve getting steeper in the 90s instead corresponds to the exponential increase in publications due to the progressive specialisation of scientific journals (e.g. all the nature-branded titles).

    The second diagram instead shows the increase in publication volumes on an incremental scale. We’ve now reached the 1M articles and counting!

    Screen Shot 2015 06 08 at 22 25 09

    In order to create the charts I played around with a nifty example from Mike Bostock (http://bl.ocks.org/mbostock/3902569) and added a couple of extra things to it.

    The full source code is on Github.

    Finally, worth mentioning that this metadata had already been made available a few of years ago under the CC0 license: you can still access it here. This upcoming release though makes it available in the context of a much more precise and stable set of ontologies. Meaning that the semantics of the dataset is more clearly laid out and consistent.

    So stay tuned for more! ..and if you plan/would like to reuse these datasets please do get in touch, either here of by emailing developers@nature.com.

     

    ]]>
    http://www.michelepasin.org/blog/2015/06/08/a-sneak-peek-at-nature-com-articles-archive/feed/ 1 2632
    Nature.com ontologies portal available online http://www.michelepasin.org/blog/2015/04/30/nature-com-ontologies-portal-available-online/ Thu, 30 Apr 2015 21:46:42 +0000 http://www.michelepasin.org/blog/?p=2618 The Nature ontologies portal is new section of the nature.com site that describes our involvement with semantic technologies and also makes available to the wider public several models and datasets as RDF linked data.

    We launched the portal nearly a month ago, to the purpose of sharing our experiences with semantic technologies and more generally to contribute to the wider linked data community with our data models and datasets.

    Screen Shot 2015 04 30 at 17 35 39

    This April 2015 release doubles the number and size of our published data models. This now spans more completely the various things that our world contains, from publication things – articles, figures, etc. – to classification things – article-types, subjects, etc. – and additional things used to manage our content publishing operation – assets, events, etc. Also included is a release page for the latest data release and a separate page for archival data releases.

    Npg models hierarchy v2 alt

    Background

    Is this the first time you’ve heard about semantic web and ontologies?
     
    Then you should know that even though internally at Macmillan Science and Education XML remains the main technology used to represent and store the things we publish, the metadata about these documents (e.g. publication details, subject categories etc..) are normally encoded also using a more abstract, graph-oriented information model.
     
    This is called RDF and has two key characteristics:
    – it encodes all information in the form of triples e.g. <subject><predicate><object>
    – it was built with the web in mind: broadly speaking, each of the items in a triple can be accessed via the internet i.e. it is a URIs (a generalised notion of a URL).
     
    So why using RDF?

    The RDF model makes it easier to maintain a shared yet scalable schema (aka an ‘ontology’) of the data types in use within our organization . A bit like a common language which is spoken by increasingly more data stores and thus allows to join things up more easily whenever needed.
     
    At the same time – since the RDF model is native to the web – it facilitates the ‘semantic’ integration of our data with the increasing number of other organisations that publish their data using compatible models.
     
    For example the BBC, Elsevier or more recently Springer  are among the many organisations that contribute to the Linked Data Cloud.

    What’s next

    We’ll continue improving these ontologies and releasing new ones as they are created. But probably most interestingly for many people, we’re working a new release of the whole NPG articles dataset (~1M articles).

    So stay tuned for more!

     

    ]]>
    2618