Repository logo
Collections
Browse
Statistics
  • English
  • हिंदी
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Publications
  3. Journal Article
  4. Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Publication:
Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Date

01-04-2023

Authors

Madhu, Hiren
Satapara, Shrey
Modha, Sandip
Mandl, Thomas
Majumder, Prasenjit

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

Research Projects

Organizational Units

Journal Issue

Abstract

The spread of Hate Speech on online platforms is a severe issue for societies and requires the identification of offensive content by platforms. Research has modeled Hate Speech recognition as a�text classification�problem that predicts the class of a message based on the text of the message only. However, context plays a huge role in communication. In particular, for short messages, the text of the preceding tweets can completely change the interpretation of a message within a discourse. This work extends previous efforts to classify Hate Speech by considering the current and previous tweets jointly. In particular, we introduce a clearly defined way of extracting context. We present the development of the first dataset for conversational-based Hate Speech classification with an approach for collecting context from long conversations for code-mixed Hindi (ICHCL dataset). Overall, our benchmark experiments show that the inclusion of context can improve classification performance over a baseline. Furthermore, we develop a novel processing pipeline for processing the context. The best-performing pipeline uses a fine-tuned SentBERT paired with an�LSTM�as a classifier. This pipeline achieves a macro F1 score of 0.892 on the ICHCL test dataset. Another�KNN, SentBERT, and ABC weighting-based pipeline yields an F1 Macro of 0.807, which gives the best results among traditional classifiers. So even a KNN model gives better results with an optimized�BERT�than a vanilla BERT model.

Description

Keywords

Citation

Hiren Madhu, Shrey Satapara, Sandip Modha, Mandl, Thomas, Majumder, Prasenjit, "Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments," Expert Systems with Applications, Elsevier, ISSN: 0957-4174, vol. 215, Article no. 119342, pp. 1-16, 1 Apr. 2023, doi: 10.1016/j.eswa.2022.119342. [Published date : 25 Nov. 2022]

URI

https://ir.daiict.ac.in/handle/dau.ir/1777

Collections

Journal Article

Endorsement

Review

Supplemented By

Referenced By

Full item page

Research Impact

Metrics powered by PlumX, Altmetric and Dimensions

 
Quick Links
  • Home
  • Search
  • Research Overview
  • About
Contact

DAU, Gandhinagar, India

library@dau.ac.in

+91 0796-8261-578

Follow Us

© 2025 Dhirubhai Ambani University
Designed by Library Team