Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 2 of 2
  • ItemRestricted
    A CODE-MIXING TRANSLITERATION MODEL TO IMPROVE HATE SPEECH DETECTION IN THE SAUDI DIALECT TWEETS
    (Universiti Malaya, 2024) Alhazmi, Ali Hamoud H; Associate Norisma Binti Idris, Associate Rohana Binti Mahmud, Nurul Binti Japar Mohamed Elhag Mohamed Abo
    Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. Although great efforts have been made in this area for English-language social media content, but for Arabic language, the detection of hate speech still has many specific difficulties. Arabic calls for particular consideration when it comes to hate speech detection, because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech emoji, and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the Term Frequency-Inverse Document Frequency (TF-IDF) feature, when employed with the Stochastic Gradient Descent (SGD) model, attained the highest accuracy, reaching 98.21% on code-mixing transliteration dataset. The findings from the analysis also revealed that the highest accuracy of 99% was attained on emoji transliteration dataset. Subsequently, these results were contrasted with outcomes from three baseline studies, and the proposed transliteration learning model on both the code mixing and emoji outperformed them, underscoring the significance of the proposed models. Consequently, this study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.
    21 0
  • Thumbnail Image
    ItemRestricted
    A Sociolinguistic Analysis of Arabic-English Code-Mixing in Podcasts: Exploring Saudis Listeners’Attitudes and Perspectives
    (Saudi Digital Library, 2023-11-13) Alotaibi, Norah; Benwell, Bethan; Mulvey, Nahoko
    This study investigates the phenomenon of Arabic-English code-mixing in Saudi social interactions, as observed in podcasts. There are two main objectives of this dissertation. First, this study seeks to specify the types and frequency of code-mixing (CM) in such social settings to understand how English is used within interactions dominated by the Arabic language. Second, listeners’ perspectives and attitudes towards each type of CM and CM, in general, are studied. This also includes exploring possible factors that may influence listeners’ opinions. This thesis adopted qualitative approaches and used pre-interview questionnaires as a supplementary qualitative approach in order to achieve nuanced, and comprehensive insights into social backgrounds (Braun et al., 2021). The data was collected through triangulation methods: podcast programs, pre-interview questionnaires, and semi-structured interviews with 16 Saudi podcast listeners. This study employed a combination of content analysis (Vaismoradi et al., 2013) to analyze the content and find examples of CM in podcast programs based on Muysken’s (2000) theory and the thematic analysis framework of Braun and Clarke (2021), to analyze the pre-interview questionnaires and interviews data. The researcher conducted a manual analysis of 10 episodes from podcast programs and identified 525 instances of CM. Recognizing the challenge posed by manually analyzing a substantial volume of audio segments and aiming to enhance both the sample's credibility and inclusiveness while reinforcing the findings, the Whisper AI model (Radford et al., 2023) was employed. This AI model supplemented CM instances from 215 additional episodes, resulting in the discovery of 652 cases of CM. The analysis applied to the podcast data, revealed three distinct categories: insertion, alternation, and congruent lexicalisation. Insertional CM using Arabic as the matrix language was the most prevalent, followed by alternation involving transitions between linguistic structures of distinct languages. Congruent lexicalisation, constrained by syntactic congruence between Arabic and English, was the least prevalent and often combined insertions and alternations. The participant pre-interview questionnaires and interviews identified themes of attitudes, perceptions, and evaluations. Initial participant opinions spanned positive, neutral, and negative stances, with a gradual shift towards more approval of CM, contingent upon specific conditions. Participants classified CM usage into necessity and prestige categories. Due to the sample size and the nature of the study, it is not possible to determine with certainty the impact of demographic factors, but there is some indication that language proficiency played a notable role. Those with higher English proficiency were less inclined towards CM, although agreement was found among participants with varying proficiency. Participants with higher proficiency favored alternation, while those with lower proficiency leaned towards insertion. Differing views emerged on congruent lexicalisation, with acceptance of specific lexical items such as Arabic and rejection based on language purity and solidarity concerns. The evaluation section explored CM merits and drawbacks from the participants' perspectives. These findings indicate how individuals in a changing country like Saudi Arabia balance preserving traditions with embracing new opportunities through language and culture mixing and provide insight into their perceptions of societal changes. The study's theoretical and methodological implications are discussed.
    32 0

Copyright owned by the Saudi Digital Library (SDL) © 2025