Repository logo
Collections
Browse
Statistics
  • English
  • हिंदी
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Publications
  3. Researchers
  4. Thomas, Mandl

Person:
Thomas, Mandl

Loading...
Profile Picture

Name

Mandl Thomas

Job Title

International Adjunct Faculty

Email Address

Telephone

Birth Date

Specialization

Information Science, Cognitive Similarity Learning in Information Retrieval and Information Management

Abstract

Biography

Research Projects

Organizational Units

Name

Full item page
1 results

Filters

20221

Settings

search.filters.applied.f.isAuthorOfPublication: search.filters.isAuthorOfPublication.7667cdf4-69b1-435a-9156-2401dbedea9a×

Search Results

Now showing 1 - 1 of 1
  • Loading...
    Thumbnail Image
    PublicationMetadata only
    An empirical evaluation of text representation schemes to filter the social media stream
    (Taylor & Francis, 01-05-2022) Modha, Sandip; Majumder, Prasenjit; Thomas, Mandl; DA-IICT, Gandhinagar; Modha, Sandip (201221001)
    Modeling text in a numerical representation is a prime task for any Natural Language Processing downstream task such as text classification. This paper attempts to study the effectiveness of text representation schemes on the text classification task, such as aggressive text detection, a special case of Hate speech from social media. Aggression levels are categorized into three predefined classes, namely: �Non-aggressive� (NAG), �Overtly Aggressive� (OAG), and �Covertly Aggressive� (CAG). Various text representation schemes based on BoW techniques, word embedding, contextual word embedding, sentence embedding on traditional classifiers, and deep neural models are compared on a text classification problem. The weighted�??1�score is used as a primary evaluation metric. The results show that text representation using Googles� universal sentence encoder (USE) performs better than word embedding and BoW techniques on traditional classifiers, such as SVM, while pre-trained word embedding models perform better on classifiers based on the deep neural models on the English dataset. Recent pre-trained transfer learning models like Elmo, ULMFi, and BERT are fine-tuned for the aggression classification task. However, results are not at par with the pre-trained word embedding model. Overall, word embedding using pre-trained fastText vectors produces the best weighted�??1-score than Word2Vec and Glove. On the Hindi dataset, BoW techniques perform better than word embeddings on traditional classifiers such as SVM. In contrast, pre-trained word embedding models perform better on classifiers based on the deep neural nets. Statistical significance tests are employed to ensure the significance of the classification results. Deep neural models are more robust against the bias induced by the training dataset. They perform substantially better than traditional classifiers, such as SVM, logistic regression, and Naive Bayes classifiers on the Twitter test dataset.
 
Quick Links
  • Home
  • Search
  • Research Overview
  • About
Contact

DAU, Gandhinagar, India

library@dau.ac.in

+91 0796-8261-578

Follow Us

© 2025 Dhirubhai Ambani University
Designed by Library Team