SACM - United Kingdom

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667

Browse

Search Results

Now showing 1 - 7 of 7

Restricted
An NLP-Driven Framework for Business Email Compromise Detection and Authorship Verifcation
(Saudi Digital Library, 2025) Almutairi, Amirah; AlHashimy, Nawfal; Kang, BooJoong
Business Email Compromise (BEC) presents a critical cybersecurity threat, leveraging linguistic impersonation and social engineering rather than traditional malicious payloads. These attacks routinely evade conventional flters by mimicking legitimate communication styles and exploiting trusted identities. This thesis explores content-based detection strategies for BEC using a sequence of natural language processing (NLP) models. First, it proposes a transformer-based classifer to detect semantic indicators of deception in email body text. Second, it develops a Siamese authorship verifcation (AV) model that captures stylistic consistency, even under adversarial mimicry. These components are unifed within a multi-task learning (MTL) framework that simultaneously optimizes for BEC detection and AV by sharing underlying representations while preserving task-specifc objectives. To support empirical evaluation, a structured taxonomy of BEC fraud is introduced, and a synthetic email dataset is generated through prompt-guided language model fne-tuning and human validation. Experiments on combined real and synthetic corpora demonstrate that the MTL model achieves up to 97% F1-score in BEC detection and 93% in AV, outperforming transfer learning baseline while reducing false positives and computational overhead. This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing gaps in current defenses against sophisticated impersonation attacks.
6 0
Restricted
Embracing Emojis in Sarcasm Detection to Enhance Sentiment Analysis
(University of Southampton, 2025) Alsabban, Malak Abdullah; Hall, Wendy; Weal, Mark
People frequently share their ideas, concerns, and emotions on social networks, making sentiment analysis on social media increasingly important for understanding public opinion and user sentiment. Sentiment analysis provides an effective means of interpreting people's attitudes towards various topics, individuals, or ideas. This thesis introduces the creation of an Emoji Dictionary (ED) to harness the rich contextual information conveyed by emojis. It acts as a valuable resource for deciphering the emotional nuances embedded in textual content, contributing to a deeper understanding of sentiment. In addition, the research explores the complex domain of sarcasm detection by proposing a novel Sarcasm Detection Approach (SDA). This approach identifies sarcasm by analysing conflicts between textual content and the accompanying emojis. The thesis addresses key challenges in sentiment analysis by evaluating and comparing emoji dictionaries and sarcasm detection approaches to enhance sentiment classification. Extensive experimentation on diverse datasets rigorously assesses the effectiveness of these methods in improving sentiment analysis accuracy and sarcasm detection performance, particularly in emoji-rich datasets. The findings highlight the crucial role of emojis as contextual cues, underscoring their value in sentiment analysis and sarcasm detection tasks. The outcomes of this thesis aim to advance sentiment analysis methodologies by offering insights into preprocessing strategies, leveraging the expressive potential of emojis through the Emoji Dictionary (ED), and introducing the Sarcasm Detection Approach (SDA). The research demonstrates that integrating emojis through these tools substantially enhances both sentiment analysis and sarcasm detection. By utilizing these tools, the study not only improves model performance but also opens avenues for further exploration into the nuanced complexities of digital communication.
19 0
Restricted
Evaluating Chess Moves by Analysing Sentiments in Teaching Textbooks
(the University of Manchester, 2025) Alrdahi, Haifa Saleh T; Batista-navarro, Riza
The rules of playing chess are simple to comprehend, and yet it is challenging to make accurate decisions in the game. Hence, chess lends itself well to the development of an artificial intelligence (AI) system that simulates real-life problems, such as in decision-making processes. Learning chess strategies has been widely investigated, with most studies focused on learning from previous games using search algorithms. Chess textbooks encapsulate grandmaster knowledge, which explains playing strategies. This thesis investigates three research questions on the possibility of unlocking hidden knowledge in chess teaching textbooks. Firstly, we contribute to the chess domain with a new heterogeneous chess dataset “LEAP”, consists of structured data that represents the environment “board state”, and unstructured data that represent explanation of strategic moves. Additionally, we build a larger unstructured synthetic chess dataset to improve large language models familiarity with the chess teaching context. With the LEAP dataset, we examined the characteristics of chess teaching textbooks and the challenges of using such a data source for training Natural Language (NL)-based chess agent. We show by empirical experiments that following the common approach of sentence-level evaluation of moves are not insightful. Secondly, we observed that chess teaching textbooks are focused on explanation of the move’s outcome for both players alongside discussing multiple moves in one sentence, which confused the models in move evaluation. To address this, we introduce an auxiliary task by using verb phrase-level to evaluate the individual moves. Furthermore, we show by empirical experiments the usefulness of adopting the Aspect-based Sentiment Analysis (ABSA) approach as an evaluation method of chess moves expressed in free-text. With this, we have developed a fine-grained annotation and a small-scale dataset for the chess-ABSA domain “ASSESS”. Finally we examined the performance of a fine-tuned LLM encoder model for chess-ABSA and showed that the performance of the model for evaluating chess moves is comparable to scores obtained from a chess engine, Stockfish. Thirdly, we developed an instruction-based explanation framework, using prompt engineering with zero-shot learning to generate an explanation text of the move outcome. The framework also used a chess ABSA decoder model that uses an instructions format and evaluated its performance on the ASSESS dataset, which shows an overall improvement performance. Finally, we evaluate the performance of the framework and discuss the possibilities and current challenges of generating large-scale unstructured data for the chess, and the effect on the chess-ABSA decoder model.
12 0
Restricted
AI-Driven Approaches for Privacy Compliance: Enhancing Adherence to Privacy Regulations
(Univeristy of Warwick, 2024-02) Alamri, Hamad; Maple, Carsten
This thesis investigates and explores some inherent limitations within the current privacy policy landscape, provides recommendations, and proposes potential solutions to address these issues. The first contribution of this thesis is a comprehensive study that addresses a significant gap in the literature. This study provides a detailed overview of the current landscape of privacy policies, covering both their limitations and proposed solutions, with the aim of identifying the most practical and applicable approaches for researchers in the field. Second, the thesis tackles the challenge of privacy policy accessibility in app stores by introducing the App Privacy Policy Extractor (APPE) system. The APPE pipeline consists of various components, each developed to perform a specific task and provide insightful information about the apps' privacy policies. By analysing over two million apps in the iOS App Store, APPE offers unprecedented and comprehensive store-wide insights into policy distribution and can act as a mechanism for enforcing privacy policy requirements in app stores automatically. Third, the thesis investigates the issue of privacy policy complexity. By establishing generalisability across app categories and drawing attention to associated matters of time and cost, the study demonstrates that the current situation requires immediate and effective solutions. It suggests several recommendations and potential solutions. Finally, to enhance user engagement with privacy policies, a novel framework utilising a cost-effective unsupervised approach, based on the latest AI innovations, has been developed. The comparison of the findings of this study with state-of-the-art methods suggests that this approach can produce outcomes that are on par with those of human experts, or even surpass them, yet in a more efficient and automated manner.
25 0
Restricted
Evaluating CAMeL-BERT for Sentiment Analysis of Customer Satisfaction with STC (Saudi Telecom Company) Services
(The University of Sussex, 2024-08-15) Alotaibi, Fahad; Pay, Jack
In the age of informatics platforms such as Twitter (X) plays a crucial role for measuring public sentiment, especially in both private and public sectors. This study explores the application of machine learning, particularly deep learning, to perform sentiment analysis on tweets about Saudi Telecom Company (STC) services in Saudi Arabia. A comparative analysis was conducted between pre-trained sentiment analysis models in English and in Arabic to assess their effectiveness in classifying sentiments. In addition, the study highlights a challenge in existing Arabic models, which are based on English model architectures but trained on varied datasets, such as Modern Standard Arabic and Classical Arabic (Al-Fus’ha). These models often lack the capability to handle the diverse Arabic dialects commonly used on social media. To overcome this issue, the study involved fine-tuning a pre-trained Arabic model using a dataset of tweets related to STC services, specifically focusing on the Saudi dialect. Data was collected from Twitter (X), focusing on mentions of the Saudi Telecom Company (STC). Both English and Arabic models were applied to this data, and their performance in sentiment analysis was evaluated. The fine-tuned Arabic model (CAMeL-BERT) demonstrated improved accuracy and a better understanding of local dialects compared to its initial version. The results highlight the importance of model adaptation for specific languages and contexts and underline the potential of CAMeL-BERT in sentiment analysis for Arabic-language content. The findings offer practical implications for enhancing customer service and engagement through more accurate sentiment analysis of social media content in the service providers sector.
22 0
Restricted
Unsupervised Semantic Change Detection in Arabic
(Queen Mary University of London, 2023-10-23) Sindi, Kenan; Dubossarsky, Haim
This study employs pretrained BERT models— AraBERT, CAMeLBERT (CA), and CAMeLBERT (MSA)—to investigate semantic change in Arabic across distinct time periods. Analyzing word embeddings and cosine distance scores reveals variations in capturing semantic shifts. The research highlights the significance of training data quality and diversity, while acknowledging limitations in data scope. The project's outcome—a list of most stable and changed words—contributes to Arabic NLP by shedding light on semantic change detection, suggesting potential model selection strategies and areas for future exploration.
114 0
Restricted
Hate Speech Detection for the Arabic Language
(Saudi Digital Library, 2023-11-03) Alhejaili, Abrar; Moosavi, Nafise
As online social networks grow and communication technologies become more available, people can exercise their freedom of expression more than ever before. Even though the interaction between users on these platforms can be constructive, they are increasingly used for spreading hateful content, mainly due to the anonymity feature of these online platforms. Hate speech can induce cyber conflict, negatively impacting social life at both the individual and national levels. In spite of this, social network providers are unable to monitor all the content posted by their users. As a result, there is a need to detect hate speech automatically. This need increases when the text is written in a language like Arabic. Arabic is known for its challenges, complexities, and resource scarcity. This project uses transfer learning methods to adapt, and evaluate some pretrained models to detect hate speech in Arabic. Many experiments were conducted in this project to assess the transferring of some options from BERT and Sequence-to-Sequence families (e.g., DehateBERT, MARBERT, T5, and Flan-T5), and the transferring of preprocessing functions from a pretrained model (AraBERT). Experiments show that transfer learning by finetuning monolingual models has promising results to a different extent. In addition, the additional preprocessing can affect the performance in a good way. Nevertheless, dealing with low-frequency labels independently, such as our dataset’s hate class, is still challenging. Warning: This paper may include instances of offensive language.
34 0

SACM - United Kingdom

Browse

Filters

Settings

Sort By

Results per page

Search Results