Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 3 of 3
  • ItemRestricted
    A CODE-MIXING TRANSLITERATION MODEL TO IMPROVE HATE SPEECH DETECTION IN THE SAUDI DIALECT TWEETS
    (Universiti Malaya, 2024) Alhazmi, Ali Hamoud H; Associate Norisma Binti Idris, Associate Rohana Binti Mahmud, Nurul Binti Japar Mohamed Elhag Mohamed Abo
    Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. Although great efforts have been made in this area for English-language social media content, but for Arabic language, the detection of hate speech still has many specific difficulties. Arabic calls for particular consideration when it comes to hate speech detection, because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech emoji, and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the Term Frequency-Inverse Document Frequency (TF-IDF) feature, when employed with the Stochastic Gradient Descent (SGD) model, attained the highest accuracy, reaching 98.21% on code-mixing transliteration dataset. The findings from the analysis also revealed that the highest accuracy of 99% was attained on emoji transliteration dataset. Subsequently, these results were contrasted with outcomes from three baseline studies, and the proposed transliteration learning model on both the code mixing and emoji outperformed them, underscoring the significance of the proposed models. Consequently, this study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.
    21 0
  • Thumbnail Image
    ItemRestricted
    TWITTER HATE SPEECH DETECTION BASED ON DEEP LEARNING METHODS
    (University of Idaho, 2023) Alkomah, Fatimah; Marshall Ma, Xiaogang
    Hate speech is a toxic discourse that results from prejudices or conflicts between different groups within and across societies that could lead to episodes that quickly proliferate on social media. Hate speech affects people and culture as frequently as it is disseminated rapidly on social media. Consequently, when the number of social media users (Twitter, for example) increases, the effect of hate speech might be significant owing to the ease of users’ anonymity. Several machine learning models have been suggested to identify hate speech on social media; nevertheless, many difficulties have limited existing techniques. One difficulty is the multiple comprehensions of hate speech structures, resulting in many speech categories and interpretations. In addition, existing machine learning algorithms lack universality owing to the use of tiny datasets and the incorporation of a few characteristics of hate speech. Most hate speech systems focus on n-grams, part-of-speech tags, and sentiments, while some utilize lexicons as additional criteria. This research is motivated primarily by a desire to safeguard members of diverse groups, faiths, and identities against harassment, sarcasm, and harm. Additionally, the work will be helpful in social media, where offensive information may be immediately banned. The purpose of this research is to (1) identify and extract hate speech textual features from literature, (2) study and analyze current benchmark datasets for hate speech detection, and (3) develop a machine learning model for textual hate speech detection based on new proposed feature sets. The generic approach proposed here is a multi-label classification model based on a previous Twitter dataset of 150k tweets, called the multimodal hate speech dataset (MMHS150K). The tweet text was taken for further preprocessing, such as stop word removal, lower casing, emoji preprocessing, and others. Literature has several features; therefore, selecting a subset of these features is crucial to developing a successful hate speech detection model. Thereby, three groups of features were taken into consideration. These features are (1) Feature set 1: counts of hashtags, usernames, emojis, and URLs, (2) Feature set 2: inverse- document-term frequency and word embeddings features, and (3) Feature set 3: a set of psychological traits features based on the Linguistic Inquiry and Word Count (LIWC). This research assesses several machine learning techniques (f1-measure, accuracy) using the dataset and compares results with previous works. The methods include adopting an unseen set of tweets (as a case study) to validate the best-performing machine learning model. The proposed approach is carried out over these machine learning models: Naïve Bayes (NB), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), Random Forest (RF), K-Nearest Neighbors (KNN), Decision Trees (DT), Convolutional Neural Networks (CNN), Long-term memory (LSTM), iii iii and Bidirectional Encoder Representations from Transformer (BERT). Results indicate that RF and BERT are the most effective approaches for identifying hate speech content. Also, results indicate that the most practical features of hate speech detection algorithms include psychological characteristics (i.e., LIWC) and word embedding characteristics. The findings suggested that most trained models' f1-measure for binary categorizing hate speech was over 95%. The best-proposed machine learning model (BERT) on natural and unseen examples was able to classify 70% of the examples correctly on features set of LIWC and word embeddings. Therefore, the proposed BERT model instantly detects hate speech on social networks like Twitter. As anticipated, the built machine learning models demonstrated that binary classification yields satisfactory results but lacks further improvements to multi-label classification. Implications: The findings certify the complexity of hate speech detection due to its broad range scope of different definitions. The new work provides implications to theory with newly adapted machine learning models that could be used on unseen data on Twitter or similar social media platforms. The newly trained model might be helpful for Twitter algorithms, while the new feature combinations could also be useful for other research in natural language processing. It is concluded that multi-label classification remains complicated owing to a paucity of datasets and the different definitions of hate speech. Therefore, a substantial study is required to generate features that perform well with varied datasets and conceptions of hate speech with several facets. Furthermore, in the literature, no guidelines guarantee that hate speech detection algorithms are effectively compared across various datasets. Therefore, it may be good to supplement the current dataset with additional hate speech keywords. The proposed model would then need retraining due to the emergence of new phrases and the cessation of obsolete terms by users over time.
    21 0
  • Thumbnail Image
    ItemRestricted
    Evaluation of the Effectiveness of Media Policy in Protecting Social Media Users in Saudi Arabia from Hate Speech and Discriminatory Content: Transparency, Awareness, Trust, and Future Vision
    (Saudi Digital Library, 2023-08) Alghannam, Hussain Ali; Boyle, Raymond
    This groundbreaking study delves into the effectiveness of media policies in Saudi Arabia in combating hate speech and discriminatory content on social media platforms. Through a comprehensive exploration of Saudi users' perspectives, this research measures exposure levels, evaluates awareness and trust in state-enacted policies, and gauges users' optimism for future developments. The integration of quantitative and qualitative methods provides a nuanced understanding of the data. Results indicate relatively low exposure to harmful content, with 66.48% reporting no encounter, yet 33.52% experienced such content. Qualitative insights reveal a consensus on defining hate speech, aligning with global perspectives. Awareness of policies is high but calls for intensified education emerged, emphasising the correlation between awareness and trust. Remarkably, 88.26% express confidence in Saudi media policies. Optimism about future policy development is widespread, with over 90% expressing positivity. Recommendations for future research include broader inclusion of stakeholders, comparative studies with other cultures, and exploring the dynamics of trust and awareness. This study contributes to media management research, emphasising the importance of effective policies in creating respectful and protective digital spaces. Despite its Saudi focus, the study's implications transcend borders, advocating for global collaboration in mitigating harmful online content.
    107 0

Copyright owned by the Saudi Digital Library (SDL) © 2025