Seminar: Tagore digital editions and Bengali textual computing
Professor Sukanta Chaudhuri yesterday gave a very interesting talk on the scope, methods and aims of ‘Bichitra’ (literally, ‘the various’), the ongoing project for an online variorum edition of the complete works of Rabindranath Tagore in English and Bengali. The talk (part of this year’s DDH research seminar) highlighted a number of issues I personally wasn’t much familiar with, so in this post I’m summarising them a bit and then highlighting a couple of possible suggestions.
Sukanta Chaudhuri is Professor Emeritus at Jadavpur University, Kolkata (Calcutta), where he was formerly Professor of English and Director of the School of Cultural Texts and Records. His core specializations are in Renaissance literature and in textual studies: he published The Metaphysics of Text from Cambridge University Press in 2010. He has also translated widely from Bengali into English, and is General Editor of the Oxford Tagore Translations.
Rabindranath Tagore (1861 – 1941), the first nobel laureate of Asia, was arguably the most important icon of modern Indian Renaissance. This recent project on the electronic collation of Tagore texts, called ‘the Bichitra project’, is being developed as part of the national commemoration of the 150th birth anniversary of the poet (here’s the official page). This is how the School of Cultural Texts and Records summarizes the project’s scope:
The School is carrying out pioneer work in computer collation of Tagore texts and creation of electronic hypertexts incorporating all variant readings. The first software for this purpose in any Indian language, named “Pathantar” (based on the earlier version “Tafat”), has been developed by the School. Two pilot projects have been carried out using this software, for the play Bisarjan (Sacrifice) and the poetical collection Sonar Tari (The Golden Boat). The CD/DVDs contain all text files of all significant variant versions in manuscript and print, and their collation using the ”Pathantar” software. The DVD of Sonar Tari also contains image files of all the variant versions. These productions are the first output of the series “Jadavpur Electronic Tagore”.
Progressing from these early endeavours, we have now undertaken a two-year project entitled “Bichitra” for a complete electronic variorum edition of all Tagores works in English and Bengali. The project is funded by the Ministry of Culture, Government of India, and is being conducted in collaboration with Rabindra-Bhavana, Santiniketan. The target is to create a website which will contain (a) images of all significant variant versions, in manuscript and print, of all Tagores works; (b) text files of the same; and (c) collation of all versions applying the “Pathantar” software. To this end, the software itself is being radically redesigned. Simultaneously, manuscript and print material is being obtained and processed from Rabindra-Bhavana, downloaded from various online databases, and acquired from other sources. Work on the project commenced in March 2011 and is expected to end in March 2013, by which time the entire output will be uploaded onto a freely accessible website.
A few interesting points

Food for thought
The situation is striking similar – or not? Would it be feasible to reuse one of these approaches with textual sources?
Another relevant visualization is the one used by popular file-comparison softwares (eg File Merge on a Mac) for showing differences between two files:
– Asian language processing: current state-of-the-art [text]
– Research report on Bengali NLP engine for TTS [text]
– The Emile corpus, containing fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages [homepage]
– A complete OCR system for continuous Bengali characters [text]
– Parsing Bengali for Database Interface [text]
– Unsupervised Morphological Parsing of Bengali [text]
– Open Bangla OCR – A BDOSDN (Bangladesh Open Source Development Network) project to develop a Bangla OCR
– Bangla OCR project, mainly focused on the research and development of an Optical Character Recognizer for Bangla / Bengali script
Any comments and/or ideas?
