SACM - United Kingdom
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667
Browse
37 results
Search Results
Item Restricted CADM: Creative Accounting Detection Model in Saudi-Listed Companies(Saudi Digital Library, 2025) Bineid, Maysoon Mohamed; Beloff, NataliaIn business, financial statements are the primary source of information for investors and other stakeholders. Despite extensive regulatory efforts, the quality of financial reporting in Saudi Arabia still requires improvement, as prior studies have documented evidence of creative accounting. This practice occurs when managers manipulate accounting figures within the boundaries of the International Financial Reporting Standards to present a more favourable image of the company. Although various fraud detection methods exist, identifying manipulations that are legal yet misleading remains a significant challenge. This research introduces the Creative Accounting Detection Model (CADM), a deep learning (DL)-based approach that employs Long Short-Term Memory (LSTM) networks to identify Saudi-listed companies engaging in creative accounting. Two versions of the model were developed: CADM1, trained on a simulated dataset based on established accounting measures from the literature, and CADM2, trained on a dataset tailored to reflect financial patterns observed in the Saudi market. Both datasets incorporated financial and non-financial features derived from a preliminary survey of Saudi business experts. The models achieved training accuracies of 100% (CADM1) and 95% (CADM2). Both models were then tested on real-world data from the Saudi energy sector (2019–2023). CADM1 classified one company as engaging in creative accounting, whereas CADM2 classified all companies as non-creative but demonstrated greater stability in prediction confidence. To interpret these results, a follow-up qualitative study involving expert interviews confirmed CADM as a promising supplementary tool for auditors, enhancing analytical and oversight capabilities. These findings highlight CADM’s potential to support regulatory oversight, strengthen auditing procedures, and improve investor trust in the transparency of financial statements.12 0Item Restricted The Additional Regulatory Challenges Posed by AI In Financial Trading(Saudi Digital Library, 2025) Almutairi, Nasser; Alessio, AzzuttiAlgorithmic trading has shifted from rule-based speed to adaptive autonomy, with deep learning and reinforcement learning agents that learn, re-parameterize, and redeploy in near real time, amplifying opacity, correlated behaviours, and flash-crash dynamics. Against this backdrop, the dissertation asks whether existing EU and US legal frameworks can keep pace with new generations of AI trading systems. It adopts a doctrinal and comparative method, reading MiFID II and MAR, the EU AI Act, SEC and CFTC regimes, and global soft law (IOSCO, NIST) through an engineering lens of AI lifecycles and value chains to test functional adequacy. Chapter 1 maps the evolution from deterministic code to self-optimizing agents and locates the shrinking space for real-time human oversight. Chapter 2 reframes technical attributes as risk vectors, such as herding, feedback loops, and brittle liquidity, and illustrates enforcement and stability implications. Chapter 3 exposes human-centric assumptions (intent, explainability, “kill switches”) embedded in current rules and the gaps they create for attribution, auditing, and cross-border supervision. Chapter 4 proposes a hybrid, lifecycle-based model of oversight that combines value-chain accountability, tiered AI-agent licensing, mandatory pre-deployment verification, explainability XAI requirements, cryptographically sealed audit trails, human-in-the-loop controls, continuous monitoring, and sandboxed co-regulation. The contribution is threefold: (1) a technology-aware risk typology linking engineering realities to market integrity outcomes; (2) a comparative map of EU and US regimes that surfaces avenues for regulatory arbitrage; and (3) a practicable governance toolkit that restores traceable accountability without stifling beneficial innovation. Overall, the thesis argues for moving from incremental, disclosure-centric tweaks to proactive, lifecycle governance that embeds accountability at design, deployment, and post-trade, aligning next-generation trading technology with the enduring goals of fair, orderly, and resilient markets.11 0Item Restricted Semi-Supervised Approach For Automatic Head Gesture Classification(Saudi Digital Library, 2025) Alsharif, Wejdan; Hiroshi, ShimodairaThis study utilizes a semi-supervised method, particularly self-training, for automatic head gesture recognition using motion caption data. It explores and compares fully supervised deep learning models and self-training pipelines in terms of their perfor- mance and training approaches. The proposed approach achieved an accuracy score of 52% and a macro F1 score of 44% in the cross validation. Results have shown that leveraging self-training as part of the learning process contributes to improved model performance, due to generating pseudo-labeled data that effectively supplements the original labeled dataset, thereby enabling the model to learn from a larger and more diverse set of training examples.5 0Item Restricted ENHANCING DATAREPRESENTATION IN DISTRIBUTED MACHINE LEARNING(Saudi Digital Library, 2025) Aladwani, Tahani Abed; CHRISTOSANAGNOSTOPOULOSDistributed computing devices, ranging from smartphones to edge micro-servers—collectively referred to as clients—are capable of gathering and storing diverse types of data, such as images and voice recordings. This wide array of data sources has the potential to significantly enhance the accuracy and robustness of Deep Learning (DL) models across a variety of tasks. However, this data is intrinsically heterogeneous, due to the differences in users’ preferences, lifestyles, locations, and other factors. Consequently, it necessitates comprehensive preprocessing (e.g., labeling, filtering, relevance assessment, balancing, etc.) to ensure its suitability for the development of effective and reliable models. Therefore, this thesis explores the feasibility of conducting predictive analytics and model inference on edge computing (EC) systems when access to data is limited, and on clients’ devices through federated learning (FL) when direct access to data is entirely restricted. The first part of this thesis focuses on reducing the data transmission rate between clients and EC servers by employing techniques such as data and task caching, identifying data overlaps, and evaluating task popularity. While this strategy can significantly minimize data offloading to the lowest possible level, it does not entirely eliminate dependence on third-party entities. The second part of this thesis eliminates the dependency on third-party entities by implementing FL, where direct access to raw data is not possible. In this context, node and data selection are guided by predictions and model performance. The objective is to identify the most suitable nodes and relevant data for training by clustering nodes based on data characteristics and analyzing the overlap between query boundaries and cluster boundaries. The third part of this thesis introduces a mechanism designed to support classification tasks, such as image classification. These tasks present significant challenges when building models on distributed data, particularly due to issues like label shifting or missing labels across clients. To address these challenges, the proposed method mitigates the impact of imbalances across clients by employing multiple cluster-based meta-models, each tailored to specific label distributions. The fourth part of this thesis introduces a two-phase federated self-learning framework, termed 2PFL, which addresses the challenges of extreme data scarcity and skewness when training classifiers over distributed labeled and unlabeled data. 2PFL demonstrates the capability to achieve high-performance models, even when trained with only 10% to 20% labeled data compared to the available unlabeled data. The conclusion chapter underscores the importance of adaptable learning mechanisms that can respond to the continuous changes in clients’ data volume, requirements, formats, and protection regulations. By incorporating the EC layer, we can alleviate concerns related to data privacy, reduce the volume of data needing offloading, expedite task execution, and facilitate the training of complex models. For scenarios demanding stricter privacy-preserving measures, FL offers a viable solution, enabling multiple clients to collaboratively train models while adhering to user privacy protection, data security, and government regulations. However, due to the indirect access to data inherent in FL, several challenges must be addressed to ensure the development of high-performance models. These challenges include imbalanced data distribution across clients, partially labeled data, and fully unlabeled data, all of which are explored and demonstrated through experimental evaluations.7 0Item Restricted Deep Learning-Based White Blood Cell Classification Through a Free and Accessible Application(Saudi Digital Library, 2025) Alluwaim, Yaseer; Campbell, NeillBackground Microscopy of peripheral blood smears (PBS) continues to play a fundamental role in hematology diagnostics, offering detailed morphological insights that complement automated blood counts. Examination of a stained blood film by a trained technician is among the most frequently performed tests in clinical hematology laboratories. Nevertheless, manual smear analysis is labor-intensive, time-consuming, and prone to considerable variability between observers. These challenges have spurred interest in automated, deep learning-based approaches to enhance efficiency and consistency in blood cell assessment. Methods We designed a convolutional neural network (CNN) using a ResNet-50 backbone, applying standard transfer-learning techniques for white blood cell (WBC) classification. The model was trained on a publicly available dataset of approximately 4,000 annotated peripheral smear images representing eight WBC types. The image processing workflow included automated nucleus detection, normalization, and extensive augmentation (rotation, scaling, etc.) to improve model generalization. Training was performed with the PyTorch Lightning framework for efficient development. Application The final model was integrated into a lightweight web application and deployed on Hugging Face Spaces, allowing accessible browser-based inference. The application provides an easy-to-use interface to upload images, which are then automatically cropped and analyzed in real-time. This open and free tool is intended to provide immediate classification results. It is also a useful tool for laboratory technologists without requiring specialized hardware or software. Results Testing on an independent set revealed that the ResNet-50 network reached 98.67% overall accuracy. Performance was consistently high across all eight WBC categories. Precision, recall, and specificity closely matched the overall accuracy, indicating well-balanced classification. However, for the assessment of real-world generalization, the model was tested on an external heterogeneous dataset from different sources. It performed with 86.33% accuracy, reflecting strong performance outside of its main training data. The confusion matrix showed negligible misclassifications. This suggested consistent distinction between leukocyte types. Conclusion This study indicates that a lightweight AI tool can support peripheral smear analysis by offering rapid and consistent WBC identification via a web interface. Such a system may reduce laboratory workload and observer variability, particularly in resource-limited or remote settings where expert microscopists are scarce, and serve as a practical training aid for personnel learning cell morphology. Limitations include reliance on a single dataset, which may not encompass all staining or imaging variations, and evaluation performed offline. Future work will aim to expand dataset diversity, enable real-time integration with digital microscopes, and conduct clinical validation to broaden applicability and adoption. Application link: https://huggingface.co/spaces/xDyas/wbc-classifier14 0Item Restricted Deep Multi-Modality Fusion for Integrative Healthcare(Queen Mary University of London, 2025) Alwazzan, Omnia; Slabaugh, GregoryThe healthcare industry generates vast amounts of data, driving advancements in patient diagnosis, treatment, and therapeutic discovery. A single patient’s electronic healthcare record often includes multiple modalities, each providing unique insights into their condition. Yet, integrating these diverse, complementary sources to gain deeper insights remains a challenge. While deep learning has transformed single-modality analysis, many clinical scenarios, particularly in cancer care, require integrating complementary data sources for a holistic understanding. In cancer care, two key modalities provide complementary perspectives: histopathology whole-slide images (WSIs) and omics data (genomic, transcriptomic, epigenomic). WSIs deliver high-resolution views of tissue morphology and cellular structures, while omics data reveal molecular-level details of disease mechanisms. In this domain, single-modality approaches fall short: histopathology misses molecular heterogeneity, and traditional bulk or non-spatial omics data lack spatial context. Although recent advances in spatial omics technologies aim to bridge this gap by capturing molecular data within spatially resolved tissue architecture, such approaches are still emerging and are not explored in this thesis. Consequently, integrating conventional WSIs and non-spatial omics data through effective fusion strategies becomes essential for uncovering their joint potential. Effective fusion of these modalities holds the potential to reveal rich, cross-modal patterns that help identify signals associated with tumor behavior. But key questions arise: How can we effectively align these heterogeneous modalities (high-resolution images and diverse molecular data) into a unified framework? How can we leverage their interactions to maximize complementary insights? How can we tailor fusion strategies to maximize the strengths of dominant modalities across diverse clinical tasks? This thesis tackles these questions head-on, advancing integrative healthcare by developing novel deep multi-modal fusion methods. Our primary focus is on integrating the aforementioned key modalities, proposing innovative approaches to enhance omics–WSI fusion in cancer research. While the downstream applications of these methods span diagnosis, prognosis, and treatment stratification, the core contribution lies in the design and evaluation of fusion strategies that effectively harness the complementary strengths of each modality. Our research develops a multi-modal fusion method to enhance cross-modality interactions between WSIs and omics data, using advanced architectures to integrate their heterogeneous feature spaces and produce discriminative representations that improve cancer grading accuracy. These methods are flexibly designed and can be applied to fuse data from diverse sources across various application domains; however, this thesis focuses primarily on cancer-related tasks. We also introduce cross-modal attention mechanisms to refine feature representation and interpretability, functioning effectively in both single-modality and bimodal settings, with applications in breast cancer classification (using mammography, MRI, and clinical metadata) and brain tumor grading (using WSIs and gene expression data). Additionally, we propose dual fusion strategies combining early and late fusion to address challenges in omics-WSI integration, such as explainability and high-dimensional omics data, aligning omics with localized WSI regions to identify tumor subtypes without patch-level labels, and capturing global interactions for a holistic perspective. We deliver three key contributions: the Multi-modal Outer Arithmetic Block (MOAB), a novel fusion method integrating latent representations from WSIs and omics data using arithmetic operations and a channel fusion technique, achieving state-of-the-art brain cancer grading performance with publicly available code; the Flattened Outer Arithmetic Attention (FOAA), an attention-based framework extending MOAB for single- and bimodal tasks, surpassing existing methods in breast and brain tumor classification; and the Multi-modal Outer Arithmetic Block Dual Fusion Network (MOAD-FNet), combining early and late fusion for explainable omics-WSI integration, outperforming benchmarks on The Cancer Genome Atlas (TCGA) and NHNN BRAIN UK datasets with interpretable WSI heatmaps aligned with expert diagnoses. Together, these contributions provide reliable, interpretable, and adaptable solutions for the multi-modal fusion domain, with a specific focus on advancing diagnostics, prognosis, and personalized healthcare strategies while addressing the critical questions driving this field forward.22 0Item Restricted Enhancing Gravitational-Wave Detection from Cosmic String Cusps in Real Noise Using Deep Learning(Saudi Digital Library, 2025) Taghreed, Bahlool; Patrick, SuttonCosmic strings are topological defects that may have formed in the early universe and could produce bursts of gravitational waves through cusp events. Detecting such signals is particularly challenging due to the presence of transient non-astrophysical artifacts—known as glitches—in gravitational-wave detector data. In this work, we develop a deep learning-based classifier designed to distinguish cosmic string cusp signals from common transient noise types, such as blips, using raw, whitened 1D time-series data extracted from real detector noise. Unlike previous approaches that rely on simulated or idealized noise environments, our method is trained and tested entirely on real noise, making it more applicable to real-world search pipelines. Using a dataset of 50,000 labeled 2-second samples, our model achieves a classification accuracy of 84.8% , recall 78.7% and false-positive rate 9.1% on unseen data. This demonstrates the feasibility of cusp-glitch discrimination directly in the time domain, without requiring time-frequency representations or synthetic data, and contributes toward robust detection of exotic astrophysical signals in realistic gravitational-wave conditions.17 0Item Restricted Predicting Delayed Flights for International Airports Using Artificial Intelligence Models & Techniques(Saudi Digital Library, 2025) Alsharif, Waleed; MHallah, RymDelayed flights are a pervasive challenge in the aviation industry, significantly impacting operational efficiency, passenger satisfaction, and economic costs. This thesis aims to develop predictive models that demonstrate strong performance and reliability, capable of maintaining high accuracy within the tested dataset and showcasing potential for application in various real-world aviation scenarios. These models leverage advanced artificial intelligence and deep learning techniques to address the complexity of predicting delayed flights. The study evaluates the performance of Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and their hybrid model (LSTM-CNN), which combine temporal and spatial pattern analysis, alongside Large Language Models (LLM, specifically OpenAI's Babbage model), which excel in processing structured and unstructured text data. Additionally, the research introduces a unified machine learning framework utilizing Gradient Boosting Machine (GBM) for regression and Light Gradient Boosting Machine (LGBM) for classification, aimed at estimating both flight delay durations and their underlying causes. The models were tested on high-dimensional datasets from John F. Kennedy International Airport (JFK), and a synthetic dataset from King Abdulaziz International Airport (KAIA). Among the evaluated models, the hybrid LSTM-CNN model demonstrated the best performance, achieving 99.91% prediction accuracy with a prediction time of 2.18 seconds, outperforming the GBM model (98.5% accuracy, 6.75 seconds) and LGBM (99.99% precision, 4.88 seconds). Additionally, GBM achieved a strong correlation score (R² = 0.9086) in predicting delay durations, while LGBM exhibited exceptionally high precision (99.99%) in identifying delay causes. Results indicated that National Aviation System delays (correlation: 0.600), carrier-related delays (0.561), and late aircraft arrivals (0.519) were the most significant contributors, while weather factors played a moderate role. These findings underscore the exceptional accuracy and efficiency of LSTM-CNN, establishing it as the optimal model for predicting delayed flights due to its superior performance and speed. The study highlights the potential for integrating LSTM-CNN into real-time airport management systems, enhancing operational efficiency and decision-making while paving the way for smarter, AI-driven air traffic systems.17 0Item Restricted Paraphrase Generation and Identification at Paragraph-Level(Saudi Digital Library, 2025) Alsaqaabi, Arwa; Stewart, Craig; Akrida, Eleni; Cristea, AlexandraThe widespread availability of the Internet and the ease of accessing written content have significantly contributed to the rising incidence of plagiarism across various domains, including education. This behaviour directly undermines academic integrity, as evidenced by reports highlighting increased plagiarism in student work. Notably, students tend to plagiarize entire paragraphs more often than individual sentences, further complicating efforts to detect and prevent academic dishonesty. Additionally, advancements in natural language processing (NLP) have further facilitated plagiarism, particularly by using online paraphrasing tools and deep-learning language models designed to generate paraphrased text. These developments underscore the critical need to develop and refine effective paraphrase identification (PI) methodologies. This thesis addresses one of the most challenging aspects of plagiarism detection (PD): identifying instances of plagiarism at the paragraph-level, with a particular emphasis on paraphrased paragraphs rather than individual sentences. By focusing on this level of granularity, the approach considers both intra-sentence and inter-sentence relationships, offering a more comprehensive solution to the detection of sophisticated forms of plagiarism. To achieve this aim, the research examines the influence of text length on the performance of NLP machine learning (ML) and deep learning (DL) models. Furthermore, it introduces ALECS-SS (ALECS – Social Sciences), a large-scale dataset of paragraph-length paraphrases, and develops three novel SALAC algorithms designed to preserve semantic integrity while restructuring paragraph content. These algorithms suggest a novel approach that modifies the structure of paragraphs while maintaining their semantics. The methodology involves converting text into a graph where each node corresponds to a sentence’s semantic vector, and each edge is weighted by a numerical value representing the sentence order probability. Subsequently, a masking approach is applied to the reconstructed paragraphs modifying the v lexical elements while preserving the original semantic content. This step introduces variability to the dataset while maintaining its core meaning, effectively simulating paraphrased text. Human and automatic evaluations assess the reliability and quality of paraphrases, and additional studies examine the adaptability of SALAC across multiple academic domains. Moreover, state-of-the-art large language models (LLMs) are analysed for their ability to differentiate between human-written and machine-paraphrased text. This investigation involves the use of multiple PI datasets in addition to the newly established paragraph-level paraphrases dataset (ALECS-SS). The findings demonstrate that text length significantly affects model performance, with limitations arising from dataset segmentation. Additionally, the results show that the SALAC algorithms effectively maintain semantic integrity and coherence across different domains, highlighting their potential for domain-independent paraphrasing. The thesis also analysed the state-of-the-art LLMs’ performance in detecting auto-paraphrased content and distinguishing them from human-written content at both the sentence and paragraph levels, showing that the models could reliably identify reworded content from individual sentences up to entire paragraphs. Collectively, these findings contribute to educational applications and plagiarism detection by improving how paraphrased content is generated and recognized, and they advance NLP-driven paraphrasing techniques by providing strategies that ensure that meaning and coherence are preserved in reworded material.17 0Item Restricted Deep Learning based Cancer Classification and Segmentation in Medical Images(Saudi Digital Library, 2025) Alharbi, Afaf; Zhang, QianniCancer has significantly threatened human life and health for many years. In the clinic, medical images analysis is the golden stand for evaluating the prediction of patient prog- nosis and treatment outcome. Generally, manually labelling tumour regions in hundreds of medical images is time- consuming and expensive for pathologists, radiologists and CT scans experts. Recently, the advancements in hardware and computer vision have allowed deep-learning-based methods to become main stream to segment tumours automatically, significantly reducing the workload of healthcare professionals. However, there still remain many challenging tasks towards medical images such as auto- mated cancer categorisation, tumour area segmentation, and relying on large-scale labeled images. Therefore, this research studies theses challenges tasks in medical images proposing novel deep-learning paradigms that can support healthcare professionals in cancer diagnosis and treatment plans. Chapter 3 proposes automated tissue classification framework called Multiple Instance Learning (MIL) in whole slide histology images. To overcome the limitations of weak super- vision in tissue classification, we incorporate the attention mechanism into the MIL frame- work. This integration allows us to effectively address the challenges associated with the inadequate labeling of training data and improve the accuracy and reliability of the tissue classification process. Chapter 4 proposes a novel approach for histopathology image classification with MIL model that combines an adaptive attention mechanism into an end-to-end deep CNN as well as transfer learning pre-trained models (Trans-AMIL). Well-known Transfer Learning architectures of VGGNet [14], DenseNet [15] and ResNet[16] are leverage in our framework implementation. Experiment and deep analysis have been conducted on public histopathol- ogy breast cancer dataset. The results show that our Trans-AMIL proposed approach with VGG pre- trained model demonstrates excellent improvement over the state-of-the-art. Chapter 5 proposes a self-supervised learning for Magnetic resonance imaging (MRI) tu- mour segmentation. A self-supervised cancer segmentation framework is proposed to re- duce label dependency. An innovative Barlow-Twins technique scheme combined with swin transformer is developed to perform this self supervised method in MRI brain medical im- ages. Additionally, data augmentation are applied to improve the discriminability of tumour features. Experimental results show that the proposed method achieves better tumour seg- mentation performance than other popular self- supervised methods. Chapter 6 proposes an innovative Barlow Twins self supervised technique combined with Regularised variational auto-encoder for MRI tumour images as well as CT scans images segmentation task. A self-supervised cancer segmentation framework is proposed to reduce label dependency. An innovative Barlow-Twins technique scheme is developed to represent tumour features based on unlabeled images. Additionally, data augmentation are applied to improve the discriminability of tumour features. Experimental results show that the pro- posed method achieves better tumour segmentation performance than other existing state of the art methods. The thesis presents four approaches for classifying and segmenting cancer images from his- tology images, MRI images and CT scans images: unsupervised, and weakly supervised methods. This research effectively classifies histopathology images tumour regions based on histopathological annotations and well-designed modules. The research additionally comprehensively segments MRI and CT images. Our studies comprehensively demonstrate label-effective automatic on various types of medical image classification and segmentation. Experimental results prove that our works achieve state-of-the-art performances on both classification and segmentation tasks on real world datasets23 0
