A CODE-MIXING TRANSLITERATION MODEL TO  IMPROVE HATE SPEECH DETECTION IN THE SAUDI  DIALECT TWEETS

Alhazmi, Ali Hamoud H

A CODE-MIXING TRANSLITERATION MODEL TO IMPROVE HATE SPEECH DETECTION IN THE SAUDI DIALECT TWEETS

dc.contributor.advisor	Associate Norisma Binti Idris, Associate Rohana Binti Mahmud, Nurul Binti Japar Mohamed Elhag Mohamed Abo
dc.contributor.author	Alhazmi, Ali Hamoud H
dc.date.accessioned	2025-02-12T05:51:49Z
dc.date.issued	2024
dc.description.abstract	Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. Although great efforts have been made in this area for English-language social media content, but for Arabic language, the detection of hate speech still has many specific difficulties. Arabic calls for particular consideration when it comes to hate speech detection, because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech emoji, and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the Term Frequency-Inverse Document Frequency (TF-IDF) feature, when employed with the Stochastic Gradient Descent (SGD) model, attained the highest accuracy, reaching 98.21% on code-mixing transliteration dataset. The findings from the analysis also revealed that the highest accuracy of 99% was attained on emoji transliteration dataset. Subsequently, these results were contrasted with outcomes from three baseline studies, and the proposed transliteration learning model on both the code mixing and emoji outperformed them, underscoring the significance of the proposed models. Consequently, this study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.
dc.format.extent	193
dc.identifier.uri	https://hdl.handle.net/20.500.14154/74853
dc.language.iso	en
dc.publisher	Universiti Malaya
dc.subject	Hate speech
dc.subject	Natural language processing
dc.subject	Arabic language
dc.subject	Code-mixing
dc.subject	Machine learning models.
dc.title	A CODE-MIXING TRANSLITERATION MODEL TO IMPROVE HATE SPEECH DETECTION IN THE SAUDI DIALECT TWEETS
dc.type	Thesis
sdl.degree.department	ARTIFICIAL INTELLIGENCE
sdl.degree.discipline	Computer science
sdl.degree.grantor	Universiti Malaya
sdl.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ٍSACM-Dissertation.pdf
Size:: 2.27 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

SACM - Malaysia