resource

"A Passage to India": Pre-trained Word Embeddings for Indian Languages

Dense word vectors or 'word embeddings' which encode semantic properties of words, have now become integral to NLP tasks like Machine Translation (MT), Question Answering (QA), Word Sense Disambiguation (WSD), and Information Retrieval (IR). In this …

Challenge Datasets of Cognate and False Friend Pairs for Indian Languages

Cognates are present in multiple variants of the same text across different languages (e.g., hund in German and hound in English language mean dog). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine …