Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 10 of 75
  • ItemRestricted
    GAN-Enhanced Super-Resolution Pipeline for Robust Word Recognition in Low-Quality Non-English Handwriting.
    (Saudi Digital Library, 2025) Shahbi, Bilqis; Xia, Panqiu
    Executive summary The dissertation tackles a critical issue where current optical character recognition (OCR) technologies fall short: correctly identifying handwritten non-English scripts in poor quality and deteriorated settings. While OCR technologies have matured for printed English and other Latin-based languages, scripts such as Arabic, Devanagari, and Telugu remain underrepresented due to structural complexities, cursive connections, diacritics, ligatures, and the limited availability of annotated datasets. These challenges are amplified by real-world factors such as low-resolution scans, noisy archival documents, and mobile phone captures, where fine details necessary for recognition are lost. The study proposes a two-stage deep learning pipeline that integrates super-resolution with recognition, explicitly designed to address these shortcomings. The first stage of the pipeline utilises Real-ESRGAN, a generative adversarial network specifically optimised for real-world image degradation. Unlike earlier models such as SRCNN, VDSR, and ESRGAN, which often prioritize aesthetics or hallucinate features, Real-ESRGAN reconstructs high-resolution images with sharper strokes, preserved ligatures, and clear diacritics. Its Residual-in-Residual Dense Block (RRDB) architecture combines residual learning and dense connections, enabling robust recovery of fine-grained textual details. By preserving structural fidelity rather than merely visual appeal, Real-ESRGAN ensures that enhanced images retain the critical features necessary for recognition. The second stage utilises a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) loss function. The CRNN combines convolutional layers for feature extraction, bidirectional LSTM layers for capturing sequential dependencies, and CTC decoding for alignment-free sequence prediction. This integration eliminates the need for explicit segmentation, a complicated task in cursive or densely written scripts. Together, the two stages form a cohesive system in which image enhancement directly supports recognition accuracy. To ensure robustness, the research incorporated extensive dataset preparation and preprocessing. Handwritten datasets for Arabic, Devanagari, and Telugu scripts were selected to reflect structural diversity. Preprocessing included resizing, artificial noise simulation (Gaussian noise, blur, and compression artefacts), and augmentation (rotations, elastic distortions, and brightness adjustments). These techniques increased dataset variability and improved the model's ability to generalize to real-world handwriting scenarios. Evaluation was conducted at both image and recognition levels. Image quality was assessed using the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index (SSIM). At the same time, recognition performance was measured using the Character Error Rate (CER) and the Word Error Rate (WER). This dual evaluation ensured that improvements in image clarity translated into tangible recognition gains. The results confirm the effectiveness of the proposed pipeline. Real-ESRGAN showed improvements compared to SRCNN, VDSR, and ESRGAN, with higher PSNR and SSIM values across scripts. These gains reflect superior structural fidelity, particularly in preserving Arabic cursive flows, Devanagari's horizontal headstrokes, and Telugu's stacked ligatures. Recognition accuracy also improved compared to baseline low-resolution inputs. Script-specific analysis showed more precise word boundaries in Arabic, sharper conjuncts and diacritics in Devanagari, and more distinct glyph separations in Telugu. When benchmarked against traditional OCR systems, such as Tesseract, the pipeline demonstrated clearer recognition outcomes, indicating the critical role of task-specific super-resolution in OCR, proving that enhancing input quality directly strengthens recognition performance. The dissertation makes contributions across methodological, empirical, theoretical, and practical domains. Methodologically, it demonstrates the value of integrating enhancement and recognition stages in a fine-tuned pipeline, ensuring that improvements in image clarity yield measurable gains in recognition. Empirically, it validates the effectiveness of Real-ESRGAN for handwritten text, showing consistent improvements across structurally diverse scripts. Theoretically, it advances the understanding of script-sensitive OCR, emphasizing the preservation of structural features such as diacritics and ligatures. Practically, the work highlights applications in archival preservation, e-governance, and education. By enabling more accurate digitisation of handwritten records, the system supports inclusive access to information and the preservation of linguistic heritage. The study acknowledges several limitations. The scarcity of diverse annotated datasets constrains the model's generalizability to other scripts such as Amharic or Khmer. The computational expense of training Real-ESRGAN limits its feasibility in low-resource settings. Occasional GAN artefacts, where spurious strokes or distortions appear, pose risks in sensitive applications such as legal documents. Moreover, the pipeline has not been extensively tested on mixed-script texts, common in multilingual societies. These limitations suggest avenues for future work, including developing larger and more diverse datasets, designing lightweight models for real-time and mobile deployment, integrating script identification for mixed-language documents, and incorporating explainable AI for greater transparency in recognition decisions. In conclusion, the dissertation demonstrates that GAN-enhanced super-resolution is not merely a cosmetic tool but an essential step toward robust OCR in non-English handwritten texts. By aligning image enhancement with recognition objectives, the proposed pipeline reduces error rates and generalizes across diverse scripts. Its implications extend beyond technical achievement to cultural preservation, digital inclusion, and the democratization of access to information. At the same time, the identified limitations provide a roadmap for future research, ensuring that multilingual OCR evolves into a truly global and inclusive technology.
    13 0
  • ItemRestricted
    CADM: Creative Accounting Detection Model in Saudi-Listed Companies
    (Saudi Digital Library, 2025) Bineid, Maysoon Mohamed; Beloff, Natalia
    In business, financial statements are the primary source of information for investors and other stakeholders. Despite extensive regulatory efforts, the quality of financial reporting in Saudi Arabia still requires improvement, as prior studies have documented evidence of creative accounting. This practice occurs when managers manipulate accounting figures within the boundaries of the International Financial Reporting Standards to present a more favourable image of the company. Although various fraud detection methods exist, identifying manipulations that are legal yet misleading remains a significant challenge. This research introduces the Creative Accounting Detection Model (CADM), a deep learning (DL)-based approach that employs Long Short-Term Memory (LSTM) networks to identify Saudi-listed companies engaging in creative accounting. Two versions of the model were developed: CADM1, trained on a simulated dataset based on established accounting measures from the literature, and CADM2, trained on a dataset tailored to reflect financial patterns observed in the Saudi market. Both datasets incorporated financial and non-financial features derived from a preliminary survey of Saudi business experts. The models achieved training accuracies of 100% (CADM1) and 95% (CADM2). Both models were then tested on real-world data from the Saudi energy sector (2019–2023). CADM1 classified one company as engaging in creative accounting, whereas CADM2 classified all companies as non-creative but demonstrated greater stability in prediction confidence. To interpret these results, a follow-up qualitative study involving expert interviews confirmed CADM as a promising supplementary tool for auditors, enhancing analytical and oversight capabilities. These findings highlight CADM’s potential to support regulatory oversight, strengthen auditing procedures, and improve investor trust in the transparency of financial statements.
    12 0
  • ItemRestricted
    Artificial Intelligence, Deep Learning, and the Black Box Opacity: International Law and Modern Governance Framework for Legal Compliance and Individual Responsibility
    (Saudi Digital Library, 2025) Aloqayli, Muhannad Khalid; Linarelli, John
    This dissertation examines the unprecedented challenges that deep learning models in artificial intelligence pose to international humanitarian law frameworks governing armed conflict, addressing critical questions about international humanitarian law compliance capabilities, legal personality under the framework of international law and international humanitarian law, and international individual criminal responsibility when autonomous weapons systems employ deep learning models in decision-making processes. Chapter Two provides a comprehensive technical analysis of deep learning architectures, including convolutional neural networks, recurrent neural networks, generative adversarial networks, and transformer networks, and their military applications in target recognition, threat assessment, and autonomous operations. The analysis demonstrates that properly trained deep learning systems can achieve exceptional accuracy in tasks relevant to the principles of distinction and proportionality. However, this technical capability exists alongside a fundamental limitation: the “black box challenge,” whereby decision-making processes emerge from statistical pattern recognition across billions of parameters in ways that remain incomprehensible to human operators, creating unprecedented challenges for legal compliance and individual responsibility. Chapter Three evaluates whether granting legal personality to advanced artificial intelligence could address emerging responsibility gaps. Applying the analytical pragmatic approach through dual criteria of “value context” and “legitimacy context,” the analysis reaches definitive negative conclusions. Granting artificial intelligence legal personality would contradict international humanitarian law’s human-centered foundations, fail to fill responsibility gaps, and potentially shield humans from liability while introducing conceptual incoherence into established normative structures. Chapter Four demonstrates that deep learning, as a black box model in statistical learning, fundamentally challenges traditional international frameworks for individual criminal responsibility. The analysis reveals structural incompatibilities between algorithmic opacity and the requirements of the Rome Statute for mens rea and actus reus. Similarly, command responsibility doctrines face parallel challenges when commanders possess formal control over systems whose decision-making processes transcend human comprehension. The dissertation proposes a modified command responsibility framework recognizing commanders as “AI enablers” rather than traditional superiors, establishing reasonable governance standards for controlled environments while imposing strict liability for high-risk deployments. This framework preserves meaningful accountability while acknowledging technological constraints, shifting focus from comprehending opaque statistical processes to governing deployment decisions and operational contexts within commanders’ control.
    44 0
  • ItemRestricted
    Data-Efficient Deep Learning for Predictive Modelling of Conventional Single Slope Solar Stills: Leveraging Transfer Learning and Tailored Data Augmentation Strategies
    (Saudi Digital Library, 2025) Migaybil, Hashim; Gopaluni, Bhushan
    Conventional single-slope solar stills are essential for decentralized freshwater production, yet their performance optimization is limited by small datasets and the nonlinear dynamics of desalination. This doctoral thesis addresses these constraints by developing and evaluating data-efficient supervised machine learning frameworks to predict freshwater productivity (Pstd, L/m²·day). The study integrates a novel high-performance solar still design with two complementary learning paradigms: Transfer Learning (TL) and tailored Data Augmentation (DA). The research begins with the design and MATLAB/SIMULINK simulation of a photovoltaic-assisted single-slope solar still engineered for improved thermal performance. The hybrid system achieved a peak efficiency of 45%, and its 730-sample dataset served as the “source” domain for TL. The first paradigm introduces a cross-design TL framework. A source Artificial Neural Network (ANN) was pre-trained on the hybrid system simulation data, and its learned weights were transferred and fine-tuned to model a conventional solar still using only 365 experimental samples. The optimized TL-based ANN (5-64-64-1) outperformed both randomly initialized ANNs and Multiple Linear Regression (MLR), achieving an Overall Index of Model Performance (OIMP) of 0.872 and demonstrating superior predictive accuracy and generalization. The second paradigm develops a tailored DA strategy to directly expand the conventional still’s limited dataset. Gaussian noise–based jittering was applied to sequential inputs within a 7-day look-back window to generate synthetic training data for a one-dimensional Convolutional Neural Network (CNN-1D). The optimized CNN-1D model—comprising three 128-filter convolutional layers—substantially outperformed baseline CNN and Support Vector Regression (SVR) models, achieving an RMSE of 0.045 and an OIMP of approximately 0.97. A threshold-based classification method was also introduced to translate raw predictions into interpretable productivity categories. Overall, this thesis provides a comparative evaluation of TL and DA approaches, validating their effectiveness in addressing data scarcity in solar still modeling. Key contributions include a novel cross-design TL framework, a specialized DA technique for time-series solar still data, and highly accurate predictive models. The findings provide practical, cost-effective tools for optimizing conventional solar stills and underscore the broader potential of advanced machine learning in renewable energy–driven desalination.
    19 0
  • ItemRestricted
    The Additional Regulatory Challenges Posed by AI In Financial Trading
    (Saudi Digital Library, 2025) Almutairi, Nasser; Alessio, Azzutti
    Algorithmic trading has shifted from rule-based speed to adaptive autonomy, with deep learning and reinforcement learning agents that learn, re-parameterize, and redeploy in near real time, amplifying opacity, correlated behaviours, and flash-crash dynamics. Against this backdrop, the dissertation asks whether existing EU and US legal frameworks can keep pace with new generations of AI trading systems. It adopts a doctrinal and comparative method, reading MiFID II and MAR, the EU AI Act, SEC and CFTC regimes, and global soft law (IOSCO, NIST) through an engineering lens of AI lifecycles and value chains to test functional adequacy. Chapter 1 maps the evolution from deterministic code to self-optimizing agents and locates the shrinking space for real-time human oversight. Chapter 2 reframes technical attributes as risk vectors, such as herding, feedback loops, and brittle liquidity, and illustrates enforcement and stability implications. Chapter 3 exposes human-centric assumptions (intent, explainability, “kill switches”) embedded in current rules and the gaps they create for attribution, auditing, and cross-border supervision. Chapter 4 proposes a hybrid, lifecycle-based model of oversight that combines value-chain accountability, tiered AI-agent licensing, mandatory pre-deployment verification, explainability XAI requirements, cryptographically sealed audit trails, human-in-the-loop controls, continuous monitoring, and sandboxed co-regulation. The contribution is threefold: (1) a technology-aware risk typology linking engineering realities to market integrity outcomes; (2) a comparative map of EU and US regimes that surfaces avenues for regulatory arbitrage; and (3) a practicable governance toolkit that restores traceable accountability without stifling beneficial innovation. Overall, the thesis argues for moving from incremental, disclosure-centric tweaks to proactive, lifecycle governance that embeds accountability at design, deployment, and post-trade, aligning next-generation trading technology with the enduring goals of fair, orderly, and resilient markets.
    11 0
  • ItemRestricted
    Semi-Supervised Approach For Automatic Head Gesture Classification
    (Saudi Digital Library, 2025) Alsharif, Wejdan; Hiroshi, Shimodaira
    This study utilizes a semi-supervised method, particularly self-training, for automatic head gesture recognition using motion caption data. It explores and compares fully supervised deep learning models and self-training pipelines in terms of their perfor- mance and training approaches. The proposed approach achieved an accuracy score of 52% and a macro F1 score of 44% in the cross validation. Results have shown that leveraging self-training as part of the learning process contributes to improved model performance, due to generating pseudo-labeled data that effectively supplements the original labeled dataset, thereby enabling the model to learn from a larger and more diverse set of training examples.
    5 0
  • ItemRestricted
    ENHANCING DATAREPRESENTATION IN DISTRIBUTED MACHINE LEARNING
    (Saudi Digital Library, 2025) Aladwani, Tahani Abed; CHRISTOSANAGNOSTOPOULOS
    Distributed computing devices, ranging from smartphones to edge micro-servers—collectively referred to as clients—are capable of gathering and storing diverse types of data, such as images and voice recordings. This wide array of data sources has the potential to significantly enhance the accuracy and robustness of Deep Learning (DL) models across a variety of tasks. However, this data is intrinsically heterogeneous, due to the differences in users’ preferences, lifestyles, locations, and other factors. Consequently, it necessitates comprehensive preprocessing (e.g., labeling, filtering, relevance assessment, balancing, etc.) to ensure its suitability for the development of effective and reliable models. Therefore, this thesis explores the feasibility of conducting predictive analytics and model inference on edge computing (EC) systems when access to data is limited, and on clients’ devices through federated learning (FL) when direct access to data is entirely restricted. The first part of this thesis focuses on reducing the data transmission rate between clients and EC servers by employing techniques such as data and task caching, identifying data overlaps, and evaluating task popularity. While this strategy can significantly minimize data offloading to the lowest possible level, it does not entirely eliminate dependence on third-party entities. The second part of this thesis eliminates the dependency on third-party entities by implementing FL, where direct access to raw data is not possible. In this context, node and data selection are guided by predictions and model performance. The objective is to identify the most suitable nodes and relevant data for training by clustering nodes based on data characteristics and analyzing the overlap between query boundaries and cluster boundaries. The third part of this thesis introduces a mechanism designed to support classification tasks, such as image classification. These tasks present significant challenges when building models on distributed data, particularly due to issues like label shifting or missing labels across clients. To address these challenges, the proposed method mitigates the impact of imbalances across clients by employing multiple cluster-based meta-models, each tailored to specific label distributions. The fourth part of this thesis introduces a two-phase federated self-learning framework, termed 2PFL, which addresses the challenges of extreme data scarcity and skewness when training classifiers over distributed labeled and unlabeled data. 2PFL demonstrates the capability to achieve high-performance models, even when trained with only 10% to 20% labeled data compared to the available unlabeled data. The conclusion chapter underscores the importance of adaptable learning mechanisms that can respond to the continuous changes in clients’ data volume, requirements, formats, and protection regulations. By incorporating the EC layer, we can alleviate concerns related to data privacy, reduce the volume of data needing offloading, expedite task execution, and facilitate the training of complex models. For scenarios demanding stricter privacy-preserving measures, FL offers a viable solution, enabling multiple clients to collaboratively train models while adhering to user privacy protection, data security, and government regulations. However, due to the indirect access to data inherent in FL, several challenges must be addressed to ensure the development of high-performance models. These challenges include imbalanced data distribution across clients, partially labeled data, and fully unlabeled data, all of which are explored and demonstrated through experimental evaluations.
    7 0
  • ItemRestricted
    Deep Learning-Based White Blood Cell Classification Through a Free and Accessible Application
    (Saudi Digital Library, 2025) Alluwaim, Yaseer; Campbell, Neill
    Background Microscopy of peripheral blood smears (PBS) continues to play a fundamental role in hematology diagnostics, offering detailed morphological insights that complement automated blood counts. Examination of a stained blood film by a trained technician is among the most frequently performed tests in clinical hematology laboratories. Nevertheless, manual smear analysis is labor-intensive, time-consuming, and prone to considerable variability between observers. These challenges have spurred interest in automated, deep learning-based approaches to enhance efficiency and consistency in blood cell assessment. Methods We designed a convolutional neural network (CNN) using a ResNet-50 backbone, applying standard transfer-learning techniques for white blood cell (WBC) classification. The model was trained on a publicly available dataset of approximately 4,000 annotated peripheral smear images representing eight WBC types. The image processing workflow included automated nucleus detection, normalization, and extensive augmentation (rotation, scaling, etc.) to improve model generalization. Training was performed with the PyTorch Lightning framework for efficient development. Application The final model was integrated into a lightweight web application and deployed on Hugging Face Spaces, allowing accessible browser-based inference. The application provides an easy-to-use interface to upload images, which are then automatically cropped and analyzed in real-time. This open and free tool is intended to provide immediate classification results. It is also a useful tool for laboratory technologists without requiring specialized hardware or software. Results Testing on an independent set revealed that the ResNet-50 network reached 98.67% overall accuracy. Performance was consistently high across all eight WBC categories. Precision, recall, and specificity closely matched the overall accuracy, indicating well-balanced classification. However, for the assessment of real-world generalization, the model was tested on an external heterogeneous dataset from different sources. It performed with 86.33% accuracy, reflecting strong performance outside of its main training data. The confusion matrix showed negligible misclassifications. This suggested consistent distinction between leukocyte types. Conclusion This study indicates that a lightweight AI tool can support peripheral smear analysis by offering rapid and consistent WBC identification via a web interface. Such a system may reduce laboratory workload and observer variability, particularly in resource-limited or remote settings where expert microscopists are scarce, and serve as a practical training aid for personnel learning cell morphology. Limitations include reliance on a single dataset, which may not encompass all staining or imaging variations, and evaluation performed offline. Future work will aim to expand dataset diversity, enable real-time integration with digital microscopes, and conduct clinical validation to broaden applicability and adoption. Application link: https://huggingface.co/spaces/xDyas/wbc-classifier
    14 0
  • ItemRestricted
    Deep Multi-Modality Fusion for Integrative Healthcare
    (Queen Mary University of London, 2025) Alwazzan, Omnia; Slabaugh, Gregory
    The healthcare industry generates vast amounts of data, driving advancements in patient diagnosis, treatment, and therapeutic discovery. A single patient’s electronic healthcare record often includes multiple modalities, each providing unique insights into their condition. Yet, integrating these diverse, complementary sources to gain deeper insights remains a challenge. While deep learning has transformed single-modality analysis, many clinical scenarios, particularly in cancer care, require integrating complementary data sources for a holistic understanding. In cancer care, two key modalities provide complementary perspectives: histopathology whole-slide images (WSIs) and omics data (genomic, transcriptomic, epigenomic). WSIs deliver high-resolution views of tissue morphology and cellular structures, while omics data reveal molecular-level details of disease mechanisms. In this domain, single-modality approaches fall short: histopathology misses molecular heterogeneity, and traditional bulk or non-spatial omics data lack spatial context. Although recent advances in spatial omics technologies aim to bridge this gap by capturing molecular data within spatially resolved tissue architecture, such approaches are still emerging and are not explored in this thesis. Consequently, integrating conventional WSIs and non-spatial omics data through effective fusion strategies becomes essential for uncovering their joint potential. Effective fusion of these modalities holds the potential to reveal rich, cross-modal patterns that help identify signals associated with tumor behavior. But key questions arise: How can we effectively align these heterogeneous modalities (high-resolution images and diverse molecular data) into a unified framework? How can we leverage their interactions to maximize complementary insights? How can we tailor fusion strategies to maximize the strengths of dominant modalities across diverse clinical tasks? This thesis tackles these questions head-on, advancing integrative healthcare by developing novel deep multi-modal fusion methods. Our primary focus is on integrating the aforementioned key modalities, proposing innovative approaches to enhance omics–WSI fusion in cancer research. While the downstream applications of these methods span diagnosis, prognosis, and treatment stratification, the core contribution lies in the design and evaluation of fusion strategies that effectively harness the complementary strengths of each modality. Our research develops a multi-modal fusion method to enhance cross-modality interactions between WSIs and omics data, using advanced architectures to integrate their heterogeneous feature spaces and produce discriminative representations that improve cancer grading accuracy. These methods are flexibly designed and can be applied to fuse data from diverse sources across various application domains; however, this thesis focuses primarily on cancer-related tasks. We also introduce cross-modal attention mechanisms to refine feature representation and interpretability, functioning effectively in both single-modality and bimodal settings, with applications in breast cancer classification (using mammography, MRI, and clinical metadata) and brain tumor grading (using WSIs and gene expression data). Additionally, we propose dual fusion strategies combining early and late fusion to address challenges in omics-WSI integration, such as explainability and high-dimensional omics data, aligning omics with localized WSI regions to identify tumor subtypes without patch-level labels, and capturing global interactions for a holistic perspective. We deliver three key contributions: the Multi-modal Outer Arithmetic Block (MOAB), a novel fusion method integrating latent representations from WSIs and omics data using arithmetic operations and a channel fusion technique, achieving state-of-the-art brain cancer grading performance with publicly available code; the Flattened Outer Arithmetic Attention (FOAA), an attention-based framework extending MOAB for single- and bimodal tasks, surpassing existing methods in breast and brain tumor classification; and the Multi-modal Outer Arithmetic Block Dual Fusion Network (MOAD-FNet), combining early and late fusion for explainable omics-WSI integration, outperforming benchmarks on The Cancer Genome Atlas (TCGA) and NHNN BRAIN UK datasets with interpretable WSI heatmaps aligned with expert diagnoses. Together, these contributions provide reliable, interpretable, and adaptable solutions for the multi-modal fusion domain, with a specific focus on advancing diagnostics, prognosis, and personalized healthcare strategies while addressing the critical questions driving this field forward.
    22 0
  • ItemRestricted
    Enhancing Gravitational-Wave Detection from Cosmic String Cusps in Real Noise Using Deep Learning
    (Saudi Digital Library, 2025) Taghreed, Bahlool; Patrick, Sutton
    Cosmic strings are topological defects that may have formed in the early universe and could produce bursts of gravitational waves through cusp events. Detecting such signals is particularly challenging due to the presence of transient non-astrophysical artifacts—known as glitches—in gravitational-wave detector data. In this work, we develop a deep learning-based classifier designed to distinguish cosmic string cusp signals from common transient noise types, such as blips, using raw, whitened 1D time-series data extracted from real detector noise. Unlike previous approaches that rely on simulated or idealized noise environments, our method is trained and tested entirely on real noise, making it more applicable to real-world search pipelines. Using a dataset of 50,000 labeled 2-second samples, our model achieves a classification accuracy of 84.8% , recall 78.7% and false-positive rate 9.1% on unseen data. This demonstrates the feasibility of cusp-glitch discrimination directly in the time domain, without requiring time-frequency representations or synthetic data, and contributes toward robust detection of exotic astrophysical signals in realistic gravitational-wave conditions.
    19 0

Copyright owned by the Saudi Digital Library (SDL) © 2026