PanchBhoota: Hierarchical phrase based machine translation systems for five Indian languages


We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPBSMT) systems for five Indian language pairs namely Bengali-Hindi, English-Hindi, Marathi-Hindi, Tamil-Hindi, and Telugu-Hindi, in three domains each, HEALTH, TOURISM and GENERAL. We named them PanchBhoota, as these systems are elemental in nature. We used a very simple approach to train, tune, and test them using ‘cdec’ toolkit. We hope that this work will motivate Indian Language Machine Translation researchers to look deeper into the field of HPBSMT which is known to perform better than Phrase Based Statistical Machine Translation

Winning System of SMT Competition at ICON 2014