Hate Speech Detection for the Arabic Language
Date
2023-11-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
As online social networks grow and communication technologies become more available, people can
exercise their freedom of expression more than ever before. Even though the interaction between
users on these platforms can be constructive, they are increasingly used for spreading hateful content,
mainly due to the anonymity feature of these online platforms. Hate speech can induce cyber conflict,
negatively impacting social life at both the individual and national levels. In spite of this, social
network providers are unable to monitor all the content posted by their users. As a result, there is a
need to detect hate speech automatically. This need increases when the text is written in a language
like Arabic. Arabic is known for its challenges, complexities, and resource scarcity.
This project uses transfer learning methods to adapt, and evaluate some pretrained models to detect
hate speech in Arabic. Many experiments were conducted in this project to assess the transferring of
some options from BERT and Sequence-to-Sequence families (e.g., DehateBERT, MARBERT, T5,
and Flan-T5), and the transferring of preprocessing functions from a pretrained model (AraBERT).
Experiments show that transfer learning by finetuning monolingual models has promising results to a
different extent. In addition, the additional preprocessing can affect the performance in a good way.
Nevertheless, dealing with low-frequency labels independently, such as our dataset’s hate class, is still
challenging.
Warning: This paper may include instances of offensive language.
Description
Keywords
Hate Speech Detection, Arabic, BERT, T5
Citation
Harvared