Repository logo
Collections
Browse
Statistics
  • English
  • हिंदी
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Publications
  3. Journal Article
  4. An empirical evaluation of text representation schemes to filter the social media stream

Publication:
An empirical evaluation of text representation schemes to filter the social media stream

Date

01-05-2022

Authors

Modha, Sandip
Majumder, Prasenjit
Thomas, Mandl

Journal Title

Journal ISSN

Volume Title

Publisher

Taylor & Francis

Research Projects

Organizational Units

Journal Issue

Abstract

Modeling text in a numerical representation is a prime task for any Natural Language Processing downstream task such as text classification. This paper attempts to study the effectiveness of text representation schemes on the text classification task, such as aggressive text detection, a special case of Hate speech from social media. Aggression levels are categorized into three predefined classes, namely: �Non-aggressive� (NAG), �Overtly Aggressive� (OAG), and �Covertly Aggressive� (CAG). Various text representation schemes based on BoW techniques, word embedding, contextual word embedding, sentence embedding on traditional classifiers, and deep neural models are compared on a text classification problem. The weighted�??1�score is used as a primary evaluation metric. The results show that text representation using Googles� universal sentence encoder (USE) performs better than word embedding and BoW techniques on traditional classifiers, such as SVM, while pre-trained word embedding models perform better on classifiers based on the deep neural models on the English dataset. Recent pre-trained transfer learning models like Elmo, ULMFi, and BERT are fine-tuned for the aggression classification task. However, results are not at par with the pre-trained word embedding model. Overall, word embedding using pre-trained fastText vectors produces the best weighted�??1-score than Word2Vec and Glove. On the Hindi dataset, BoW techniques perform better than word embeddings on traditional classifiers such as SVM. In contrast, pre-trained word embedding models perform better on classifiers based on the deep neural nets. Statistical significance tests are employed to ensure the significance of the classification results. Deep neural models are more robust against the bias induced by the training dataset. They perform substantially better than traditional classifiers, such as SVM, logistic regression, and Naive Bayes classifiers on the Twitter test dataset.

Description

Keywords

Citation

Sandip Modha,Majumder, Prasenjit and Mandl, Thomas"An empirical evaluation of text representation schemes to filter the social media stream," Journal of Experimental & Theoretical Artificial Intelligence, Taylor & Francis, ISSN: 1362-3079, vol. 34, no. 3, May-Jun. 2022, pp. 499-525, doi: 10.1080/0952813X.2021.1907792. [Published Date: 24 Apr 2021]

URI

https://ir.daiict.ac.in/handle/dau.ir/1779

Collections

Journal Article

Endorsement

Review

Supplemented By

Referenced By

Full item page

Research Impact

Metrics powered by PlumX, Altmetric and Dimensions

 
Quick Links
  • Home
  • Search
  • Research Overview
  • About
Contact

DAU, Gandhinagar, India

library@dau.ac.in

+91 0796-8261-578

Follow Us

© 2025 Dhirubhai Ambani University
Designed by Library Team