paper – Parerga und Paralipomena

Pypapers: a bare-bones, command line, PDF manager

mikele — Sun, 30 Jun 2019 22:48:40 +0000

Ever felt like softwares like Mendeley or Papers are great, but somehow slow you down? Ever felt like none of the many reference manager softwares out there will ever cut it for you, cause you need something R E A L L Y SIMPLE? I did. Many times. So I’ve finally crossed the line and tried out building a simple commmand-line PDF manager. PyPapers, is called.

Yes – that’s right – command line. So not for everyone. Also: this is bare bones and pre-alpha. So don’t expect wonders. It basically provides a simple interface for searching a folder full of PDFs. That’s all for now!

Key features (or lack of)

Mac only, I’m afraid. I’m sitting on the shoulders of a giant. That is, mdfind.

No fuss search in file names only or full text

Shows all results and relies on Preview for reading

Highlighting on Preview works pretty damn fine and it’s the ultimate compatibility solution (any other software kinds of locks you in eventually, imho)

Open source. If you can code Python you can customise it to your needs. If you can’t, open an issue in github and I may end up doing it.

It recognises sub-folders, so that can be leveraged to become a simple, filesystem level, categorization structure for your PDFs (eg I have different folders for articles, books, news etc..)

Your PDFs live in the Mac filesystem ultimately. So you can always search them using Finder in case you get bored of the command line.

First impressions

Pretty good. Was concerned I was gonna miss things like collections or tags. But I found a workaround: first, identify the papers I am interested in. Then, create a folder in the same directory and symlink them in there (= create an alias).

It’s not quite like uncarved wood, but it definitely feels simple enough.

ISWC14 paper: a hybrid semantic publishing architecture combining XML and RDF

mikele — Tue, 25 Nov 2014 08:55:21 +0000

I’m posting here a short summary of the paper I’ve given at the last International Semantic Web conference in Riva del Garda (ISWC14) together with my colleague Tony Hammond.

The presentation focused on an hybrid data architecture (XML for storage&querying, RDF for modeling&integration) which emerged as the most practical solution during the process of re-engineering of the publishing platform which has occurred within our company (Macmillan S&E) in the last years.

This is the abstract:

This paper presents recent work carried out at Macmillan Science and Education in evolving a traditional XML-based, document- centric enterprise publishing platform into a scalable, thing-centric and RDF-based semantic architecture. Performance and robustness guarantees required by our online products on the one hand, and the need to support legacy architectures on the other, led us to develop a hybrid infrastructure in which the data is modelled throughout in RDF but is replicated and distributed between RDF and XML data stores for efficient retrieval. A recently launched product – dynamic pages for scientific subject terms – is briefly introduced as a result of this semantic publishing architecture.

The paper is available online; slides from the presentation can be found below.

The ISWC industry track was packed with interesting papers so I think it’s worth taking a look at the online proceedings. The uptake of tech outside academia is always revealing of the many real-world difficulties involved in making something fit within pre-existing work practices and legacy technologies. This is especially true of larger companies, where investment in older technologies (and in people who know about them) can be considerable, hence upgrades are costly and need to be evaluated more carefully.

This is the sort of background that led me and my colleagues at MacMillan to opt for a hybrid solution that combines the power of an established enterprise MarkLogic installation with more cutting edge data integration approaches based on RDF.

Nature.com subject pages were one of the first products built on top of this architecture. And many more will come: we’re still heavily involved in this work though, so stay tuned for more stuff in this space.

Soon, we will also be releasing our public ontologies online and making available a new and improved version of the nature.com datasets.

Event: THATcamp Kansas and Digital Humanities Forum

mikele — Wed, 28 Sep 2011 16:56:55 +0000

The THATcamp Kansas and Digital Humanities Forum happened last week at the Institute for Digital Research in the Humanities, which is part of the University of Kansas in beautiful Lawrence. I had the opportunity to be there and give a talk about some recent stuff I’ve been working on regarding digital prosopography and computer ontologies, so in this blog post I’m summing up a bit the things that caught my attention while at the conference.

The event happened on September 22-24 and consistend of three separate things:

Bootcamp Workshops: a set of in-depth workshops on digital tools and other DH topics http://kansas2011.thatcamp.org/bootcamps/.

THATCamp: an “unconference” for technologists and humanists http://kansas2011.thatcamp.org/.

Representing Knowledge in the DH conference: a one-day program of panels and poster sessions (schedule | abstracts )

The workshop and THATcamp were both packed with interesting stuff, so I strongly suggest you take a look at the online documentation, which is very comprehensive. In what follows I’ll instead highlight some of the contributed papers which a) I liked and b) I was able to attend (needless to say, this list matches only my individual preference and interests). Hope you’ll find something of interest there too!

A (quite subjective) list of interesting papers

The Graphic Visualization of XML Documents, by David Birnbaum ( abstract ): a quite inspiring example of how to employ visualizations in order to support philological research in the humanities. Mostly focused on Russian texts and XML-oriented technologies, but its principles easily generalizable to other contexts and technologies.

Exploring Issues at the Intersection of Humanities and Computing with LADL, by Gregory Aist ( abstract ): the talk presented LADL, the Learning Activity Description Language, a fascinating software environment provides a way to “describe both the information structure and the interaction structure of an interactive experience”, to the purpose of “constructing a single interactive Web page that allows for viewing and comparing of multiple source documents together with online tools”.

Making the most of free, unrestricted texts–a first look at the promise of the Text Creation Partnership, by Rebecca Welzenbach ( abstract ): an interesting report on the pros and cons of making available a large repository of SGML/XML encoded texts from the Eighteenth Century Collections Online (ECCO) corpus.

The hermeneutics of data representation, by Michael Sperberg-McQueen ( abstract ): a speculative and challenging investigation of the assumptions at the root of any machine-readable representation of knowledge – and their cultural implications.

Breaking the Historian’s Code: Finding Patterns of Historical Representation, by Ryan Shaw ( abstract ): an investigation on the usage of natural language processing techniques to the purpose of ‘breaking down’ the ‘code’ of historical narrative. In particular, the sets of documents used are related to the civil rights movement, and the specific NLP techniques being employed are named entity recognition, event extraction, and event chain mining.

Employing Geospatial Genealogy to Reveal Residential and Kinship Patterns in a Pre-Holocaust Ukrainian Village, by Stephen Egbert.( abstract ): this paper showed how it is possible to visualize residential and kinship patterns in the mixed-ethnic settlements of pre-Holocaust Eastern Europe by using geographic information systems (GIS), and how these results can provide useful materials for humanists to base their work on.

Prosopography and Computer Ontologies: towards a formal representation of the ‘factoid’ model by means of CIDOC-CRM, by me and John Bradley ( abstract ): this is the paper I presented (shameless self plug, I know). It’s about the evolution of structured prosopography (= the ‘study of people’ in history) from a mostly single-application and database-oriented scenario towards a more interoperable and linked-data one. In particular, I talked about the recent efforts for representing the notion of ‘factoids’ (a conceptual model normally used in our prosopographies) using the ontological language provided by CIDOC-CRM (a computational ontology commonly used in the museum community).

Social Reference Manager: Mendeley

mikele — Fri, 21 Aug 2009 09:58:49 +0000

A colleague mentioned the existence of Mendeley to me – a new and free reference manager. I’ve stuck with Papers for a while and was really really happy with it, but I have to admit that Mendeley seems to have quite a few cool features there.

For example:
1) it’s free (and hopefully it’ll remain like that forever)
2) it provides an online counterpart, so that you can check/manage your reference library online too
3) it’s a social application – it aims at building up a community of researchers/users based on the categorization of one of their primary interests: papers
4) it can be used by researchers as a ‘research homepage’ which features quite a lot about their academic profile..

Conclusion: definitely worth a try!

What else is available in the market?

Not much that handles well both the tasks of a document manager and a social application; however these other tools/apps are worth checking out:

Zotero: http://www.zotero.org/, “Zotero [zoh-TAIR-oh] is a free, easy-to-use tool to help you collect, organize, cite, and share your research sources. It lives right where you do your work—in the web browser itself.”

Papers: http://www.mekentosj.com/, “Award winning applications for scientific research”

Citeulike: http://www.citeulike.org/, “citeulike is a free service for managing and discovering scholarly references”

Qiqqa: http://www.qiqqa.com/, “The essential software for academic and research work”

Sente: http://www.thirdstreetsoftware.com/site/SenteForMac.html, “Sente 6 for Mac will change the way you think about academic reference management. It will change the way you collect your reference material, the way you organize your library, the way you read papers and take notes, and the way you write up your own research.”

Wizfolio: http://wizfolio.com/, “WizFolio is an online research collaboration tool for knowledge discovery. With WizFolio you can easily manage and share all types of information in a citation ready format including research papers, patents, documents, books, YouTube videos, web snippets and a lot more. “

Refworks: http://www.refworks.com/, “RefWorks — an online research management, writing and collaboration tool — is designed to help researchers easily gather, manage, store and share all types of information, as well as generate citations and bibliographies.”

For a more extensive list and analysis, check out this awesome wikipedia page: Comparison_of_reference_management_software