cross-lingual word embeddings

"A Passage to India": Pre-trained Word Embeddings for Indian Languages

Dense word vectors or 'word embeddings' which encode semantic properties of words, have now become integral to NLP tasks like Machine Translation (MT), Question Answering (QA), Word Sense Disambiguation (WSD), and Information Retrieval (IR). In this …

Challenge Datasets of Cognate and False Friend Pairs for Indian Languages

Cognates are present in multiple variants of the same text across different languages (e.g., hund in German and hound in English language mean dog). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine …

Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees

Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. However, existing methods face meaning conflation deficiency due to the usage of lexical similarity-based measures. …