Words | Michele Pasin

paper Enhancing the Accessibility of ORCID Public Data, now additionally hosted on Google BigQuery.

Jun 2025 4th International Conference on the Science of Science and Innovation, Copenhagen, Denmark, Jun 2025.

ORCID is committed to openness, exemplified by the annual release of its Public Data File since 2012. This dataset, encompassing all public ORCID records, has been downloaded over 190,000 times and serves as a resource for analyzing research community dynamics, scientific migrations, collaboration networks, and ORCID adoption trends. However, the file’s substantial size poses challenges for users lacking advanced data management skills, hindering exploratory analyses

paper Alleanze Ingannevoli: Svelare il lato nascosto della ricerca.

Jan 2025 1° Congresso Nazionale sull’Integrità nella Ricerca, Rome, Italy, Jan 2025.

Introdotta nel mondo della ricerca nel 2024, la scientometria forense (Forensic Scientometrics o FoSci) è una nuova disciplina sviluppata per facilitare l'analisi dei dati di pubblicazione, delle reti di co-autorialità, delle collaborazioni istituzionali e altro ancora. Le tecniche FoSci permettono di portare alla luce aspetti della ricerca scientifica che indicano potenziali rischi, come la partecipazione occulta a reti di ricerca compromesse o i rapporti con individui o gruppi noti per la diffusione di produzioni scientifiche di dubbia qualità o fraudolente.

blog Unpacking OpenAlex topics classification.

Sep 2024

In this post I have taken a closer look at the classification of scientific disciplines in OpenAlex, a recently developed database of scientific works. The topics classification has been entirely generated computationally using a mix of citation clustering techniques and LLM-based labeling. The results, although not always so precise, are definitely worth exploring further.

paper Dimensions: Calculating Disruption Indices at Scale.

Sep 2024 Quantitative Science Studies, Sep 2024. https://doi.org/10.48550/arXiv.2309.06120

Evaluating the disruptive nature of academic ideas is a new area of research evaluation that moves beyond standard citation-based metrics by taking into account the broader citation context of publications or patents. The "CD index" and a number of related indicators have been proposed in order to characterise mathematically the disruptiveness of scientific publications or patents. This research area has generated a lot of attention in recent years, yet there is no general consensus on the significance and reliability of disruption indices. More experimentation and evaluation would be desirable, however is hampered by the fact that these indicators are expensive and time-consuming to calculate, especially if done at scale on large citation networks. We present a novel method to calculate disruption indices that leverages the Dimensions cloud-based research infrastructure and reduces the computational time taken to produce such indices by an order of magnitude, as well as making available such functionalities within an online environment that requires no set-up efforts. We explain the novel algorithm and describe how its results align with preexisting implementations of disruption indicators. This method will enable researchers to develop, validate and improve mathematical disruption models more quickly and with more precision, thus contributing to the development of this new research area.

blog Designing great dashboards: a slidedeck.

Jul 2023

What makes a dashboard great? Here is a slide deck (gslides )that consolidates several useful ideas I've ran into in the past.

blog Notes from the book: Deep Work (2016).

Jul 2023

Finally got down to reading the book Deep Work from Cal Newport (2016).

blog Any sufficiently advanced technology is indistinguishable from magic.

Jun 2023

Arthur C Clarke once commented that "Any sufficiently advanced technology is indistinguishable from magic"

blog SciGraph 2017-2023.

Feb 2023

Springer Nature retired SciGraph earlier this month. I have been the data architect and then technical lead for this project, so this is post is just a reminder of the great things we did in it. Also, a little rant about the things that weren't that great...

blog Paperpile: a PDF manager with Google Drive backend.

Jan 2023

Paperpile is an online PDF manager that stores your personal data in your Google Drive folder.

blog Ontospy version 2.0 released.

Oct 2022

Version 2 of the library includes SHACL support as well as various internal refactoring. Ontospy is an open source Python library and command line tool for working with vocabularies encoded in the RDF family of languages.