research – Parerga und Paralipomena

Notes from the Force11 annual conference

mikele — Sat, 17 Jan 2015 18:04:41 +0000

I attended the https://www.force11.org/ conference in Oxford the last couple of days (the conference was previously called ‘Beyond the PDF’).

Force11 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing. Individually and collectively, we aim to bring about a change in modern scholarly communications through the effective use of information technology. [About Force 11]

Rather than the presentations, I would say that the most valuable aspect of this event are the many conversations you can have with people from different backgrounds: techies, publishers, policy makers, academics etc..

Nonetheless, here’s a (very short and biased) list of things that seemed to stand out.

A talk titled Who’s Sharing with Who? Acknowledgements-driven identification of resources by David Eichmann, University of Iowa. He is working on a (seemingly very effective) method for extracting contributors roles from scientific articles

This presentation describes my recent work in semantic analysis of the acknowledgement section of biomedical research articles, specifically the sharing of resources (instruments, reagents, model organisms, etc.) between the author articles and other non-author investigators. The resulting semantic graph complements the knowledge currently captured by research profiling systems, which primarily focus on investigators, publications and grants. My approach results in much finer-grained information, at the individual author contribution level, and the specific resources shared by external parties. The long-term goal for this work is unification with the VIVO-ISF-based CTSAsearch federated search engine, which currently contains research profiles from 60 institutions worldwide.

A talk titled Why are we so attached to attachments? Let’s ditch them and improve publishing by Kaveh Bazargan, head of River Valley Technologies. He demoed a prototype manuscript tracking system that allows editors, authors and reviewers to create new versions of the same document via an online google-doc-like system which has JATS XML in the background

I argue that it is precisely the ubiquitous use of attachments that has held up progress in publishing. We have the technology right now to allow the author to write online and have the file saved automatically as XML. All subsequent work on the “manuscript” (e.g. copy editing, QC, etc) can also be done online. At the end of the process the XML is automatically “rendered” to PDF, Epub, etc, and delivered to the end user, on demand. This system is quicker as there are no emails or attachments to hold it up, cheaper as there is no admin involved, and more accurate as there is only one definitive file (the XML) which is the “format of record”.

Rebecca Lawrence from F1000 presented and gave me a walk through of a new suite of tools they’re working on. That was quite impressing I must say, especially due to the variety of features they offer: tools to organize and store references, annotate and discuss articles and web pages, import them into word documents etc.. All packed within a nicely looking and user friendly application. This is due to go public beta some time in March, but you can try to get access to it sooner by signing up here.

The best poster award went to 101 Innovations in Scholarly Communication – the Changing Research Workflow. This is a project aiming to chart innovation in scholarly information and communication flows. Very inspiring and definitely worth a look.

Finally, I’m proud to say that the best demo award went to my own resquotes.com, a personal quotations-manager online tool which I’ve just launched a couple of weeks ago. Needless to say, it was great to get vote of confidence from this community!

If you want more, it’s worth taking a look directly at the conference agenda and in particular the demo/poster session agenda. And hopefully see you next year in Portland, Oregon :-)

Working toward meritocracy in Italy

mikele — Fri, 07 Jan 2011 10:03:14 +0000

It’s no news that thousands of italian researchers left their home-country in order to be able to carry on doing what they like; the news though is that now we’re got also a scientific study shedding some light on the phenomena. It’s been done by the Via Academy, an association dedicated to the creation of statistical analyses of the research outputs of italians (which are published on the TIS reports website).

Clearly, the high impact of scientists who are now abroad bears testament to the quality of the Italian education and attitude when it comes to Computer Science. The inevitable question thus arises: why are Computer Scientists who work in Italy not among the top in either the world ranking or the TIS list? I have been asking this question to various colleagues in the field, both inside and outside the Via-academy network. The consistent answer I have been receiving is that there is a clear disadvantage Computer Scientists must face when working in Italy with respect to their colleagues who work abroad. This disadvantage derives from a mixture of excessive teaching duties, a lack of large-scale funding and a limited recognition of scientific merit

Recent reports include the “Top 100 researchers- home & abroad” (which includes the quotation above) and the “Top 50 Italian Institutes“.

The H-method: a first step towards meritocracy

This type of analyses are mainly based on the H-index method (an index that attempts to measure both the productivity and impact of the published work of a scientist or scholar); it therefore suffers from various limitations deriving from factors such as the difference in average number of publications among disciplines (especially, between science and the humanities), or from the fact that it’s a purely quantitative analysis.

However, it seems to me as being an important step in the right direction: meritocracy.

p.s.
I found out there’s a recent article by Ignazio Marino on the topic [Working Toward Meritocracy in Italy, Ignazio R. Marino, Science 6 June 2008: 1289]; I couldn’t download it but here’s a short commentary on it I could find online:

…

Social Reference Manager: Mendeley

mikele — Fri, 21 Aug 2009 09:58:49 +0000

A colleague mentioned the existence of Mendeley to me – a new and free reference manager. I’ve stuck with Papers for a while and was really really happy with it, but I have to admit that Mendeley seems to have quite a few cool features there.

For example:
1) it’s free (and hopefully it’ll remain like that forever)
2) it provides an online counterpart, so that you can check/manage your reference library online too
3) it’s a social application – it aims at building up a community of researchers/users based on the categorization of one of their primary interests: papers
4) it can be used by researchers as a ‘research homepage’ which features quite a lot about their academic profile..

Conclusion: definitely worth a try!

What else is available in the market?

Not much that handles well both the tasks of a document manager and a social application; however these other tools/apps are worth checking out:

Zotero: http://www.zotero.org/, “Zotero [zoh-TAIR-oh] is a free, easy-to-use tool to help you collect, organize, cite, and share your research sources. It lives right where you do your work—in the web browser itself.”

Papers: http://www.mekentosj.com/, “Award winning applications for scientific research”

Citeulike: http://www.citeulike.org/, “citeulike is a free service for managing and discovering scholarly references”

Qiqqa: http://www.qiqqa.com/, “The essential software for academic and research work”

Sente: http://www.thirdstreetsoftware.com/site/SenteForMac.html, “Sente 6 for Mac will change the way you think about academic reference management. It will change the way you collect your reference material, the way you organize your library, the way you read papers and take notes, and the way you write up your own research.”

Wizfolio: http://wizfolio.com/, “WizFolio is an online research collaboration tool for knowledge discovery. With WizFolio you can easily manage and share all types of information in a citation ready format including research papers, patents, documents, books, YouTube videos, web snippets and a lot more. “

Refworks: http://www.refworks.com/, “RefWorks — an online research management, writing and collaboration tool — is designed to help researchers easily gather, manage, store and share all types of information, as well as generate citations and bibliographies.”

For a more extensive list and analysis, check out this awesome wikipedia page: Comparison_of_reference_management_software

How semantic is the semantic web?

mikele — Sun, 13 Jan 2008 17:09:54 +0000

Just read this article thanks to a colleague: I share pretty much everything it says about the SW, so I though it wouldn’t be too bad to pass it on to the next reader. Basically, it is about some very fundamental issues: what do we mean by semantics? Does a computer have semantics? If not, what’s the point of the name ‘Semantic Web’? I think that it’s quite un-controversial the fact that the choice of the name ‘semantic’ web is controversial.

I guess that many of the people originally supporting the SW vision didn’t really have the time to worry about this sort of questions, as they had different background, or maybe were just so excited about the grandiose idea an intelligent world wide web interconnected at the data level. Quite understandable, but as the idea is now reaching out to the larger public and maybe connecting to the more bottom-up Web2.0 movement, I think that it’d be great to re-think the foundations of the initial vision. Also with some rigorous clarification about the terms we use. The article of Chiara Carlino reaches an interesting conclusion:

So-called semantic web technologies provide the machine with data, like chinese symbols, and with a detailed set of instructions for handling them, in the form of ontologies. The computer simply follows the instructions, as the person in the chinese room does, and returns us useful informations, avoiding us the task of processing a big set of data on our own. These technologies have in fact nothing to do with semantics, because they never refer to anything in the real world: they never have any meaning, except in the mind of those expressing their knowledge in a machine-readable language, the mind of those preparing chinese symbols for the person in the chinese room. The person in the room â€“ the machine â€“ never ever gets this meaning. Such technologies, eventually, deal not much with semantics, but with knowledge, and its automatic processing through informatics. It seems therefore misleading and unfitting to keep on pointing with the word semantic a not semantic at all technology. It looks quite necessary to find out a new term, capable of hitting the core of this technology without giving rise to misunderstandings.

The article was also posted on the w3c SW mailing list some time ago, and generated an interesting discussion. But then, if we have to throw away the overrated ‘semantic web’ term, how should we call it instead? Without any doubt, this research strand has generated lots of interesting results, both theoretical and practical. Mmm maybe, mainly practical – see the many prototypes, ontologies and standards for manipulating ‘knowledge’. So, continues the author, what people are doing is not really dealing with ‘semantics’, but building very complex systems and infrastructures for dealing with ‘knowledge structures’:

There is a word who seems to serve this purpose, and that is epistematics. Its root â€“ episteÌme â€“ points out its strict connection with knowledge; nonetheless, it is not a theoric study, not an epistemology: it is rather an automatic processing of knowledge. The term informatic has been created to point out the automatic processing of informations: similarly the term epistematic is pretty fitting in pointing out the automatic processing of knowledge that the technologies we are speaking about make possible. The terms also reminds informatics, and this is pretty fitting as well, as this processing happens thanks to informatics. Eventually, the current â€“ though not much used â€“ meaning of epistematic is perfectly coherent with the technologies weâ€d like to point out with it: epistematic, in fact, means deductive, and one of the most advanced features of these technologies is exactly the chance to process knowledge deductively, using automatic reasoners who build into software the deductive rules of formal logics. The formerly so-called semantic web looks now like a new science, not bounded (and narrowed) anymore to the world of web, as the semantic web term suggested: epistematics is a real evolution of informatics, evolving from raw informations processing to structured knowledge processing. Epistematic technologies are those technologies allowing the automatic processing, performed through informatic instruments, of knowledge, expressed in a machine- accessible language, so that the machine can process it, according to a subset of first order logic rules, and thus extract new knowledge.

I like the term epistematics – and even more I like the fact that the ‘web’ is just a possible extension to it, not a core part of its meaning. Semantic technologies, based on various groundbreaking works the AI pioneers did some twenty or thirty years ago (mainly, in knowledge representation), have been used much before the web. Now, is the advent of the web making such a big difference to them? They used to write knowledge-based systems in KIF – now they do them in OWL – we change the language but aren’t the functionalities we are looking for the same? They used to harvest big companies’ databases and intranets to build up a knowledge base – now we also harvest the web – is that enough to claim the emergence of a new science, with new problems and methods? Or is it maybe just a different application of a well-known technology?

I must confess, the more I think about such issues, the more I feel they’re difficult and intricate. For sure the web is evolving fast – and the amount of available structured information is evolving fast too. Making sense of all this requires a huge amount of clarity of thought. And presumably, this clarity of thought will eventually drive to some clarity of expression. Wittgenstein wasn’t the first one claiming it, but for sure he did it well: language plays tricks on us. Better, with his words:

Philosophy is a battle against the bewitchment of our intelligence by means of language.

Article: “The Semantic Web: The Origins of Artificial Intelligence Redux”

mikele — Thu, 07 Jun 2007 18:07:58 +0000

Just read a very interesting article from Harry Halpin, whose work stays at the borderline between history of science (of computer science especially I gather) and (S)Web development. I think it should be a must-read for all SW practitioners, so to understand where we (yes – I’m part of them..) stand in relation to the past…

The article dates back to 2004, but the insights you’ll find there are (unfortunately) still valid today. For example, the problems the SW inherits from AI, but hardly recognizes as such in this newest community(here i just outline them – have a look at the paper for their specifications):

knowledge representation problem
higher order problem
abstraction problem
frame problem
symbol-grounding problem
problem of trust

None of this has been solved yet – although apparently the ontologies in the SW are increasing in both number and size… how come? Of course, that’s what the research is about, tryin to solve unsolved problems, but what the heck, shouldn’t we already be aware of their status as “VERY OLD PROBLEMS”?
Think of an ideal world where, as you polish up your SW-paper, on the side of the ACM category descriptor, you should also explicitly mention what problem you are tackling. Mmmmmm too dangerous.. don’t know how many papers would be classified as “novel” or “interesting”, then…
As Halpin says (quoting Santayana) “those who do not remember the past are condemned to repeat it”.
I agree. And I also agree on this conclusion, which I entirely report:

Engineering or Epistemology?

The Semantic Web may not be able to solve many of these problems. Many Semantic Web researchers pride themselves on being engineers as opposed to artificial intelligence researchers, logicians, or philosophers, and have been known to believe that many of these problems are engineering problems. While there may be suitable formalizations for time and ways of dealing with higher-order logic, the problems of knowledge representation and abstraction appear to be epistemological characteristics of the world that are ultimately resistant to any solution. It may be impossible to solve some of these problems satisfactorily, yet having awareness of these problems can only help the development of the Web.

In a related and *extremely* funny rant Drew McDermott, in a 1981 paper called Artificial intelligence meets natural stupidity, is certainly not aware of the forthcoming semantic web-wave of illusions, but certainly points out a few common mistakes we could recognize also nowadays… it’s a relaxing reading , but very competent.. I just report the final “benediction“, in which he describes the major “methodological and substantive issues over which we have stumbled”:

1. The insistence of AI people that an action is a change of state of the world or a world model, and that thinking about actions amounts to â€¢stringing state changes together to accomplish a big state change. This seems to me not an oversimplification, but a false start. How many of your actions can be characterized as state changes, or are even performed to effect state changes? How many of a program’s actions in problem solving? (NOt the actions it strings together, but the actions it takes, like “trying short strings first”, or “assuming the block is where it’s supposed to be”.)
2. The notion that a semantic network is a network. In lucid moments, network hackers realize that lines drawn between nodes stand for pointers, that almost everything in an AI program is a pointer, end that any list structure could be drawn as a network, the choice of what to call node and what to call link being arbitrary. Their lucid moments are few.
3. The notion that a semantic network is semantic.
4. Any indulgence in the “procedural-declarative” controversy. Anyone who hasn’t figured this “controversy” out yet should be considered to have missed his chance, and be banned from talking about it. Notice that at Carnegie-Mellon they haven’t worried too much about this dispute, and haven’t suffered at all.
5. The idea that because you can see your way through a problem space, your program can: the “wishful control structure” problem.

………. I couldn’t resist from adding also a reference (suggested by KMi’s mate Laurian) to a paper by Peter Gardenfors written for FOIS2004, titled “How to make the Semantic Web more semantic” . He’s proposing a novel and less-symbolic approach to knowledge representation, and the overall spirit of the paper matches the the quote from Santayana mentioned above. The conclusion reads as follows:

It is slightly discomforting to read that the philosopher John Locke already in 1690 formulated the problem of describing the structure of our semantic knowledge in his Essay Concerning Human Understanding: â€œ[M]en are far enough from having agreed on the precise number of simple ideas or qualities belonging to any sort of things, signified by its name. Nor is it a wonder; since it requires much time, pains, and skill, strict inquiry, and long examination to find out what, and how many, those simple ideas are, which are constantly and inseparably united in nature, and are always to be found together in the same subject.â€ ([25], book III, chapter VI, 30) Even though our knowledge has advanced a bit since then, we still face the same problems in the construction of the Semantic Web.