The news feed of www.michelepasin.orghttp://www.michelepasin.org/words/Latest articles, blogs posts and newsen-usMon, 16 Jun 2025 00:00:00 +0000Enhancing the Accessibility of ORCID Public Data, now additionally hosted on Google BigQueryhttps://www.michelepasin.org/papers/2025/06/16/enhancing-the-accessibility-of-orcid-public-data-now-additionally-hosted-on-google-bigquery/ORCID is committed to openness, exemplified by the annual release of its Public Data File since 2012. This dataset, encompassing all public ORCID records, has been downloaded over 190,000 times and serves as a resource for analyzing research community dynamics, scientific migrations, collaboration networks, and ORCID adoption trends. However, the file’s substantial size poses challenges for users lacking advanced data management skills, hindering exploratory analysesMon, 16 Jun 2025 00:00:00 +0000https://www.michelepasin.org/papers/2025/06/16/enhancing-the-accessibility-of-orcid-public-data-now-additionally-hosted-on-google-bigquery/Performing Research Analytics at Scale: the Dimensions Reporting Platformhttps://www.michelepasin.org/papers/2025/06/02/performing-research-analytics-at-scale-the-dimensions-reporting-platform/Mon, 02 Jun 2025 00:00:00 +0000https://www.michelepasin.org/papers/2025/06/02/performing-research-analytics-at-scale-the-dimensions-reporting-platform/Performing Research Analytics at Scale: the Dimensions Reporting Platformhttps://www.michelepasin.org/papers/2025/05/05/performing-research-analytics-at-scale-the-dimensions-reporting-platform/Mon, 05 May 2025 00:00:00 +0000https://www.michelepasin.org/papers/2025/05/05/performing-research-analytics-at-scale-the-dimensions-reporting-platform/Alleanze Ingannevoli: Svelare il lato nascosto della ricercahttps://www.michelepasin.org/papers/2025/01/22/alleanze-ingannevoli-svelare-il-lato-nascosto-della-ricerca/Introdotta nel mondo della ricerca nel 2024, la scientometria forense (Forensic Scientometrics o FoSci) è una nuova disciplina sviluppata per facilitare l'analisi dei dati di pubblicazione, delle reti di co-autorialità, delle collaborazioni istituzionali e altro ancora. Le tecniche FoSci permettono di portare alla luce aspetti della ricerca scientifica che indicano potenziali rischi, come la partecipazione occulta a reti di ricerca compromesse o i rapporti con individui o gruppi noti per la diffusione di produzioni scientifiche di dubbia qualità o fraudolente.Tue, 28 Jan 2025 00:00:00 +0000https://www.michelepasin.org/papers/2025/01/22/alleanze-ingannevoli-svelare-il-lato-nascosto-della-ricerca/Unpacking OpenAlex topics classificationhttps://www.michelepasin.org/blog/2024/09/27/open-alex-topics/ In this post I have taken a closer look at the classification of scientific disciplines in [OpenAlex](https://openalex.org/), a recently developed database of scientific works. The topics classification has been entirely generated computationally using a mix of citation clustering techniques and LLM-based labeling. The results, although not always so precise, are definitely worth exploring further. Last week I went to the [STI 2024 conference](https://sti2024.org/sti-conference/) in Berlin, the annual European get together of experts in the area of research analytics and evaluation. Lots of interesting talks but probably the thing that stroke me the most was the general excitement and sense of expectation about OpenAlex. If you haven't encountered yet, [OpenAlex](https://openalex.org/) is a 2022-released open database of research publications and other related content e.g. datasets, authors, journals etc.. developed by [OurResearch](https://ourresearch.org/). Pretty much all o ...Fri, 27 Sep 2024 00:00:00 +0000https://www.michelepasin.org/blog/2024/09/27/open-alex-topics/Dimensions: Calculating Disruption Indices at Scalehttps://www.michelepasin.org/papers/2023/09/13/dimensions-calculating-disruption-indices-at-scale/Evaluating the disruptive nature of academic ideas is a new area of research evaluation that moves beyond standard citation-based metrics by taking into account the broader citation context of publications or patents. The "CD index" and a number of related indicators have been proposed in order to characterise mathematically the disruptiveness of scientific publications or patents. This research area has generated a lot of attention in recent years, yet there is no general consensus on the significance and reliability of disruption indices. More experimentation and evaluation would be desirable, however is hampered by the fact that these indicators are expensive and time-consuming to calculate, especially if done at scale on large citation networks. We present a novel method to calculate disruption indices that leverages the Dimensions cloud-based research infrastructure and reduces the computational time taken to produce such indices by an order of magnitude, as well as making available such functionalities within an online environment that requires no set-up efforts. We explain the novel algorithm and describe how its results align with preexisting implementations of disruption indicators. This method will enable researchers to develop, validate and improve mathematical disruption models more quickly and with more precision, thus contributing to the development of this new research area.Fri, 06 Sep 2024 00:00:00 +0000https://www.michelepasin.org/papers/2023/09/13/dimensions-calculating-disruption-indices-at-scale/The Dimensions Research Security applicationhttps://www.michelepasin.org/papers/2024/06/05/the-dimensions-research-security-application/Wed, 05 Jun 2024 00:00:00 +0000https://www.michelepasin.org/papers/2024/06/05/the-dimensions-research-security-application/Designing great dashboards: a slidedeckhttps://www.michelepasin.org/blog/2023/07/06/designing-great-dashboards/ What makes a dashboard great? Here is a slide deck ([gslides](https://docs.google.com/presentation/d/e/2PACX-1vTQKTlvOtfXOKpnhdYJJEExUKf0sIh9cwiqu8SmUmU2NlhPEVOFxArj6hs77CuB8rKdUXG8om0IxKd-/pub?start=false&loop=false&delayms=3000) )that consolidates several useful ideas I've ran into in the past. After reading many useful papers and online resources on the topic of dashboards design, I realised I didn’t have a single document collecting and organising all of the useful ideas I encountered. So the purpose of this slide deck ([gslides](https://docs.google.com/presentation/d/e/2PACX-1vTQKTlvOtfXOKpnhdYJJEExUKf0sIh9cwiqu8SmUmU2NlhPEVOFxArj6hs77CuB8rKdUXG8om0IxKd-/pub?start=false&loop=false&delayms=3000) ) is to serve as a (work-in-progress) handbook a dashboards developer can get back to, in order to find inspiration, advice, and maybe, even endorsement. <iframe src="https://docs.google.com/presentation/d/e/2PACX-1vTQKTlvOtfXOKpnhdYJJEExUKf0sIh9cwiqu8SmUmU2NlhPEVOFxArj6hs77CuB8rKdUX ...Thu, 06 Jul 2023 00:00:00 +0000https://www.michelepasin.org/blog/2023/07/06/designing-great-dashboards/Notes from the book: Deep Work (2016)https://www.michelepasin.org/blog/2023/07/01/deep-work/ Finally got down to reading the book [Deep Work](https://www.worldcat.org/title/920740925) from Cal Newport (2016). The book central idea is that 'deep work' i.e. work based on prolonged stretches of focused time without distractions, has become largely underrated in today's always-on internet world. And that is not good. <!-- ![2023-07-05-notes-deep-work.png](/media/static/blog_img/2023-07-05-notes-deep-work.png) --> The book's argument didn't strike me as revolutionary, or particularly new. I think that anyone with some kind of advanced education (academic or not) knows exactly how important focused work is. ### Deep work is underrated This book is a convincing reminder of the fact that in many jobs *deep work is not seen as essential*, anymore. So we should make a conscious effort to make room for it, in our lives, and to get others to recognise its importance. *Yes - I am talking to you, business managers and time-suckers!* I saved a few passages from book that I felt ...Sat, 01 Jul 2023 00:00:00 +0000https://www.michelepasin.org/blog/2023/07/01/deep-work/Any sufficiently advanced technology is indistinguishable from magichttps://www.michelepasin.org/blog/2023/06/05/chatgpt-as-music/ [Arthur C Clarke](https://en.wikipedia.org/wiki/Clarke%27s_three_laws) once commented that "Any sufficiently advanced technology is indistinguishable from magic"  Today's [LLMs](https://en.wikipedia.org/wiki/Wikipedia:Large_language_models) get described like a baby that get magically fed the entire web’s worth of documents. The baby learns how words are associated together, can make sense of questions and can say words back to us with enormous dexterity.  But the baby hasn’t gone out in the real world a single minute.  It simply reproduces language **as if it was music**. Given an input melody, it spits out another melody that matches it, more or less, according to predefined parameters, and of course the input patterns. ### With LLMs, there is no world, just the music This is just an imitation game. It is designed to be like that. Music patterns in, musical patterns out. That’s where it derives its strength from and that’s why it appears so magical. It’s pretty damn good at imi ...Mon, 05 Jun 2023 00:00:00 +0000https://www.michelepasin.org/blog/2023/06/05/chatgpt-as-music/SciGraph 2017-2023https://www.michelepasin.org/blog/2023/02/03/rip-scigraph/ Springer Nature retired [SciGraph](https://www.springernature.com/gp/researchers/scigraph) earlier this month. I have been the data architect and then technical lead for this project, so this is post is just a reminder of the great things we did in it. Also, a little rant about the things that weren't that great... ## Open Linked Data for the Scholarly domain SciGraph has been running for almost 8 years. I've been involved with the project since its early days in 2016, together with [lots of enthusiastic people at Springer Nature](https://www.youtube.com/watch?v=HzzBuHy51wI). It started out as an attempt to break data silos about scientific publications. We chose [Linked Data](https://en.wikipedia.org/wiki/Linked_data) as its core technology for multiple reasons: its open standards and vibrant community, the expressive knowledge modeling languages, and last but not least the intent to support an increasing number of researchers/data-scientists who could independently [take advanta ...Fri, 24 Feb 2023 00:00:00 +0000https://www.michelepasin.org/blog/2023/02/03/rip-scigraph/Paperpile: a PDF manager with Google Drive backendhttps://www.michelepasin.org/blog/2023/01/19/introducing-paperpile/ [Paperpile](https://paperpile.com/) is an online PDF manager that stores your personal data in your Google Drive folder. I recenlty found out about it and discovered that it addresses the biggest issue I had with most of its competitors: the [vendor lock-in](https://en.wikipedia.org/wiki/Vendor_lock-in) problem. ![2023-01-20-paperpile-1.png](/media/static/blog_img/2023-01-20-paperpile-1.png) ## Organizing papers, hello old friend I recently started working on a new topic, collecting and organising academic papers to build a conceptual map of the area. So I began looking for a piece of software that could help with that task. This problem is not new to me. In the past I've used a lot [Mendeley](https://www.michelepasin.org/blog/2012/08/07/using-mendeley-and-dropbox-to-sync-your-pdf-library-across-computers/index.html), for this task, as well as its competitors [Readcube](https://app.readcube.com/) and [Papers](https://www.papersapp.com/). Frustrated by the lack of portability ...Thu, 19 Jan 2023 00:00:00 +0000https://www.michelepasin.org/blog/2023/01/19/introducing-paperpile/Ontospy version 2.0 releasedhttps://www.michelepasin.org/blog/2022/10/30/Ontospy-v2-released/ Version 2 of the library includes [SHACL](https://www.w3.org/TR/shacl/) support as well as various internal refactoring. [Ontospy](http://lambdamusic.github.io/Ontospy/) is an open source Python library and command line tool for working with vocabularies encoded in the RDF family of languages. It took months to get through this release.. so really glad it's finally happened. ## What's new in 2.0 Main improvements are: - Remove all Django dependencies, replaced with [Jinja2](https://jinja.palletsprojects.com/en/3.1.x/intro/#installation) - Drop support for python2 - Refactor code / clean up - Merged additional SHACL support branch [pull-107](https://github.com/lambdamusic/Ontospy/pull/107) - Fix error loading JSONLD graphs [issue-1416](https://github.com/lambdamusic/Ontospy/issues/102) - Rename internal `ontodocs` module to `gendocs` ## See also The [official documentation](http://lambdamusic.github.io/Ontospy/) ![2022-10-30-ontospy-v2.png](/media/static/blog_i ...Sun, 30 Oct 2022 00:00:00 +0000https://www.michelepasin.org/blog/2022/10/30/Ontospy-v2-released/Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQueryhttps://www.michelepasin.org/papers/2022/09/01/generating-largescale-network-analyses-of-scientific-landscapes-in-seconds-using-dimensions-on-google-bigquery/The growth of large, programatically accessible bibliometrics databases presents new opportunities for complex analyses of publication metadata. In addition to providing a wealth of information about authors and institutions, databases such as those provided by Dimensions also provide conceptual information and links to entities such as grants, funders and patents. However, data is not the only challenge in evaluating patterns in scholarly work: These large datasets can be challenging to integrate, particularly for those unfamiliar with the complex schemas necessary for accommodating such heterogeneous information, and those most comfortable with data mining may not be as experienced in data visualisation. Here, we present an open-source Python library that streamlines the process accessing and diagramming subsets of the Dimensions on Google BigQuery database and demonstrate its use on the freely available Dimensions COVID-19 dataset. We are optimistic that this tool will expand access to this valuable information by streamlining what would otherwise be multiple complex technical tasks, enabling more researchers to examine patterns in research focus and collaboration over time.Thu, 01 Sep 2022 00:00:00 +0000https://www.michelepasin.org/papers/2022/09/01/generating-largescale-network-analyses-of-scientific-landscapes-in-seconds-using-dimensions-on-google-bigquery/Bringing quotations back to lifehttps://www.michelepasin.org/blog/2022/07/28/introducing-quotes-section/ There's a new section on this site that allows to navigate quotations: [quotes.michelepasin.org](https://quotes.michelepasin.org). It's just a cut-down implementation of an [old idea](https://www.michelepasin.org/blog/2015/01/05/introducing-resquotes-com/index.html) I worked on a while ago, but you know.. sometimes it is useful to start from scratch and re-think things from the ground up. ### Why? These are quotes I've been collecting here and there, over the years, using various apps like [NVALT](https://brettterpstra.com/projects/nvalt/), [Notes](https://support.apple.com/en-gb/guide/notes/welcome/mac) or emails. The quotes have also been categorised a little using tags and titles. Since I hate to have stuff lying around on my hard drive and hardly being used, I've made a new [webapp](quotes.michelepasin.org) that allows to browse all of this content. Possibly, someone other than me can find it useful or inspiring. ### A bit of history A while ago, I built a webapp called ...Fri, 29 Jul 2022 00:00:00 +0000https://www.michelepasin.org/blog/2022/07/28/introducing-quotes-section/A semi-automated conference assistanthttps://www.michelepasin.org/blog/2022/06/30/a-semi-automated-conference-assistant/ A couple of weeks ago I went to the excellent [Move Or Perish—Scientific Trajectories, Inclusion, And Inequality, And Their Consequences For Transformative Science](https://www.csh.ac.at/event/csh-workshop-move-or-perish-scientific-trajectories-inclusion-and-inequality-and-their-consequences-for-transformative-science/) workshop in Vienna. While getting ready for it, I found myself asking some familiar questions. Who are the speakers? What is their background? How to best contextualise the topics being discussed? Nowadays scientists tend specialise in highly niche areas, so it doesn't take much for people to feel they are getting out of their confort zone, when attending a conference. So many times I wish I had an automated digital *conference assistant*. ### Brainstoming with the Dimensions API These are big question I know, but I wonder if a simple piece of software could help. To put it simply, a software that would sift through the available online information about the spe ...Thu, 30 Jun 2022 00:00:00 +0000https://www.michelepasin.org/blog/2022/06/30/a-semi-automated-conference-assistant/Exploring Bento noise boxhttps://www.michelepasin.org/blog/2022/05/29/bento-noise-box/ Improvised acid loops using [Extempore](https://extemporelang.github.io/) + [Bentō](https://www.giorgiosancristoforo.net/). > Bentō is a standalone noise box with tape recorder, inspired by the japanoise scene. Thanks to its unstable and very unique oscillators, Bentō can create an enormous number of sounds and impredictable noises that are not possible with traditional subctractive synthesizers. See the [PDF user manual](https://www.giorgiosancristoforo.net/downloads/Bento_User_Manual.pdf) ![bento-screenshot.jpg](/media/static/blog_img/bento-screenshot.jpg) ## Take 1 Just trying to control it using MIDI-CC from Extempore. Note: I previously created some MIDI mappings and saved them to a [file](https://github.com/lambdamusic/extempore-extensions/blob/main/init/init_bento.xtm) I can reload each time. <iframe width="560" height="315" src="https://www.youtube.com/embed/P6Av_eLy_xw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encr ...Sun, 29 May 2022 00:00:00 +0000https://www.michelepasin.org/blog/2022/05/29/bento-noise-box/Three things I do *not* like about Lookerhttps://www.michelepasin.org/blog/2022/04/20/Three-things-i-do-not-like-about-looker/ Following up on my previous [3 things I like about Looker](/blog/2022/03/02/Three-things-i-like-about-looker/) , here are instead the top three things that I really wish were different about this piece of software. > [Looker](https://www.looker.com/) is a business intelligence software and big data analytics platform that helps you explore, analyze and share real-time business analytics easily. Looker is part of the Google Cloud platform. ## 1. Can't make public dashboards I totally wish I was able to create a dashboard and make it available on the web without the need for users to log in. Instead: > To view the dashboard, anyone with the link must have access to the Looker instance on which the dashboard is saved, as well as access to the [dashboard](https://docs.looker.com/sharing-and-publishing/organizing-spaces#viewing_and_managing_access_for_a_folder) and [models](https://docs.looker.com/admin-options/settings/roles#model_sets) that the tiles are based on. Dashboard shar ...Wed, 20 Apr 2022 00:00:00 +0000https://www.michelepasin.org/blog/2022/04/20/Three-things-i-do-not-like-about-looker/Composition: 'Study for Cello and Double-bass'https://www.michelepasin.org/blog/2022/04/07/cellos-livecoding/ A new livecoding composition using [Extempore](https://extemporelang.github.io/) and Ableton Live: 'Study for Cello and Double-bass'. <iframe width="560" height="315" src="https://www.youtube.com/embed/VR6lMsECEQc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> ## Creating chords using a cosine function The main technique used in this piece is to generate chord/harmonic variations using a cosine functions. ```scheme (at 8 0 (set! *melody* (:mkchord (:mkint 48 (cosrfloor 7 7 1/30) 'M) 'M (cosrfloor 7 3 1/5)) ) ``` Every 8 beats the root chord (used by all instruments in order to generate musical patterns) gets updated. Two cosine functions are used to simultaneously: 1. Determine the *amplitude* of the interval (major or minor, starting from C3) that generates the root note of the chord. 2. Determine the number of notes in the chord. The ...Thu, 07 Apr 2022 00:00:00 +0000https://www.michelepasin.org/blog/2022/04/07/cellos-livecoding/Three things I like about Lookerhttps://www.michelepasin.org/blog/2022/03/02/Three-things-i-like-about-looker/ Looker is a business intelligence and data visualization tool which was recenlty acquired by Google. After nearly 6 months of using Looker for building dashboards and visual analytics, here are the top 3 things I like about this platform. > [Looker](https://www.looker.com/) is a business intelligence software and big data analytics platform that helps you explore, analyze and share real-time business analytics easily. Looker is part of the Google Cloud platform. ## 1. LookML > [LookML](https://docs.looker.com/data-modeling/learning-lookml) is a language for describing dimensions, aggregates, calculations, and data relationships in a SQL database. Looker uses a model written in LookML to construct SQL queries against a particular database. LookML provides a dedicated modeling layer for your dashboard applications. Think of LookML objects as building blocks, which can be extended and combined together in different ways without repeating code. Compared to simply writing SQL que ...Wed, 02 Mar 2022 00:00:00 +0000https://www.michelepasin.org/blog/2022/03/02/Three-things-i-like-about-looker/