Hate Speech Detection for the Arabic Language

Thumbnail Image

Date

2023-11-03

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

As online social networks grow and communication technologies become more available, people can exercise their freedom of expression more than ever before. Even though the interaction between users on these platforms can be constructive, they are increasingly used for spreading hateful content, mainly due to the anonymity feature of these online platforms. Hate speech can induce cyber conflict, negatively impacting social life at both the individual and national levels. In spite of this, social network providers are unable to monitor all the content posted by their users. As a result, there is a need to detect hate speech automatically. This need increases when the text is written in a language like Arabic. Arabic is known for its challenges, complexities, and resource scarcity. This project uses transfer learning methods to adapt, and evaluate some pretrained models to detect hate speech in Arabic. Many experiments were conducted in this project to assess the transferring of some options from BERT and Sequence-to-Sequence families (e.g., DehateBERT, MARBERT, T5, and Flan-T5), and the transferring of preprocessing functions from a pretrained model (AraBERT). Experiments show that transfer learning by finetuning monolingual models has promising results to a different extent. In addition, the additional preprocessing can affect the performance in a good way. Nevertheless, dealing with low-frequency labels independently, such as our dataset’s hate class, is still challenging. Warning: This paper may include instances of offensive language.

Description

Keywords

Hate Speech Detection, Arabic, BERT, T5

Citation

Harvared

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2024