embeddings

Challenge Datasets of Cognate and False Friend Pairs for Indian Languages

Cognates are present in multiple variants of the same text across different languages (e.g., hund in German and hound in English language mean dog). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine …

Recommendation Chart of Domains for Cross-Domain Sentiment Analysis: Findings of A 20 Domain Study

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient. However, the decision to choose a domain (known as the …

Strategies of Effective Digitization of Commentaries and Sub-commentaries: Towards the Construction of Textual History

This paper describes additional aspects of a digital tool called the ‘Textual History Tool’. We describe its various salient features with special reference to those of its features that may help the philologist digitize commentaries and …

Utilizing Word Embeddings based Features for Phylogenetic Tree Generation of Sanskrit Texts

Tracing the root of a text i.e., the original version of the text, by inferring phylogenetic trees has been a topic of interest in philological studies. However, existing methods face meaning conflation deficiency due to the usage of lexical …

An Introduction to the Textual History Tool

This paper describes a digital tool called the Textual History Tool in detail. This tool captures the historical evolution of a text through various temporal stages, and inter-related data culled from various types of related texts. This tool also …