Publications

Sarcasm is prevalent in all corners of social media, posing many challenges within Natural Language Processing (NLP), particularly for …

We present the results from the 8th round of the WMT shared task on MT Automatic Post-Editing, which consists in automatically …

Social media platforms have become new battlegrounds for anti-social elements, with misinformation being the weapon of choice. …

The detection and extraction of abbreviations from unstructured texts can help to improve the performance of Natural Language …

Named Entity Recognition (NER) is a foundational NLP task that aims to provide class labels like Person, Location, Organisation, Time, …

This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, …

Acronyms are abbreviated units of a phrase constructed by using initial components of the phrase in a text. Automatic extraction of …

Fake news, misinformation, and unverifiable facts on social media platforms propagate disharmony and affect society, especially when …

Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they …

Computational Humour (CH) has attracted the interest of Natural Language Processing and Computational Linguistics communities. Creating …

Given a noun compound (NC), we address the problem of predicting the appropriate semantic label linking the constituents of the NC. …

Automatic detection of cognates helps downstream NLP tasks of Machine Translation, Cross-lingual Information Retrieval, Computational …

Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the …

Gaze behaviour has been used as a way to gather cognitive information for a number of years. In this paper, we discuss the use of gaze …

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain …

Cognates are present in multiple variants of the same text across different languages (e.g., hund in German and hound in English …

Dense word vectors or ‘word embeddings’ which encode semantic properties of words, have now become integral to NLP tasks …

This paper describes additional aspects of a digital tool called the ‘Textual History Tool’. We describe its various salient features …

Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. …

Automatic Cognate Detection helps NLP tasks of Machine Translation, Information Retrieval, and Phylogenetics. Cognate words are defined …

Tracing the root of a text i.e., the original version of the text, by inferring phylogenetic trees has been a topic of interest in …

Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, …

This paper describes a digital tool called the Textual History Tool in detail. This tool captures the historical evolution of a text …

In today’s digital world language technology has gained importance. Several software, have been developed and are available in the …

Cognates are present in multiple variants of the same text across different languages. Computational Phylogenetics uses algorithms and …

In this paper, we describe our work on the creation of a voice model using a speech synthesis system for the Hindi Language. We use …

Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of …

Indian language WordNets have their individual web-based browsing interfaces along with a common interface for IndoWordNet. These …

A sentence is an important notion in the Indian grammatical tradition. The collection of the definitions of a sentence can be found in …

Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of …

This paper reports the work related to making Hindi Wordnet1 available as a digital resource for language learning and teaching, and …

Predicting a reader’s rating of text quality is a challenging task that involves estimating different subjective aspects of the …

Measuring reading effort is useful for practical purposes such as designing learning material and personalizing text comprehension …

Sarcasm Suite is a browser-based engine that deploys five of our past papers in sarcasm detection and generation. The sarcasm detection …

We present a quantitative, data-driven machine learning approach to mitigate the problem of unpredictability of Computer Science …

Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence …

India is a country with 22 officially recognized languages and 17 of these have WordNets, a crucial resource. Web browser based …

We present a WordNet like structured resource for slang words and neologisms on the internet. The dynamism of language is often an …

Sarcasm understandability or the ability to understand textual sarcasm depends upon readers’ language proficiency, social knowledge, …

This paper reports the work of creating bilingual mappings in English for certain synsets of Hindi wordnet, the need for doing this, …

Sentiments expressed in user-generated short text and sentences are nuanced by subtleties at lexical, syntactic, semantic and pragmatic …

In this paper, we propose a novel mechanism for enriching the feature vector, for the task of sarcasm detection, with cognitive …

We present the Civique system for emergency detection in urban areas by monitoring micro blogs like Tweets. The system detects …

WordNet has proved to be immensely useful for Word Sense Disambiguation, and thence Machine translation, Information Retrieval and …

WordNet is an online lexical resource which expresses unique concepts in a language. English WordNet is the first WordNet which was …

Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine translation (MT). In absence of …

We present TransChat, an open-source, cross platform, Indian language Instant Messaging (IM) application that facilitates cross lingual …

We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPBSMT) systems for five Indian …

We present a Parallel Corpora Management tool that aides parallel corpora generation for the task of Machine Translation (MT). It takes …

The task of Word Sense Disambiguation (WSD) incorporates in its definition the role of ‘context’. We present our work on the …

Word Sense Disambiguation (WSD) approaches have reported good accuracies in recent years. However, these approaches can be classified …

Current state-of-the-art Word Sense Disambiguation (WSD) algorithms are mostly supervised and use the P (Sense|Word) statistic for …

Does context help determine sense? This question might seem frivolous, even preposterous to anybody sensible. However, our long time …