Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
3 results
Search Results
Item Restricted GAN-Enhanced Super-Resolution Pipeline for Robust Word Recognition in Low-Quality Non-English Handwriting.(Saudi Digital Library, 2025) Shahbi, Bilqis; Xia, PanqiuExecutive summary The dissertation tackles a critical issue where current optical character recognition (OCR) technologies fall short: correctly identifying handwritten non-English scripts in poor quality and deteriorated settings. While OCR technologies have matured for printed English and other Latin-based languages, scripts such as Arabic, Devanagari, and Telugu remain underrepresented due to structural complexities, cursive connections, diacritics, ligatures, and the limited availability of annotated datasets. These challenges are amplified by real-world factors such as low-resolution scans, noisy archival documents, and mobile phone captures, where fine details necessary for recognition are lost. The study proposes a two-stage deep learning pipeline that integrates super-resolution with recognition, explicitly designed to address these shortcomings. The first stage of the pipeline utilises Real-ESRGAN, a generative adversarial network specifically optimised for real-world image degradation. Unlike earlier models such as SRCNN, VDSR, and ESRGAN, which often prioritize aesthetics or hallucinate features, Real-ESRGAN reconstructs high-resolution images with sharper strokes, preserved ligatures, and clear diacritics. Its Residual-in-Residual Dense Block (RRDB) architecture combines residual learning and dense connections, enabling robust recovery of fine-grained textual details. By preserving structural fidelity rather than merely visual appeal, Real-ESRGAN ensures that enhanced images retain the critical features necessary for recognition. The second stage utilises a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) loss function. The CRNN combines convolutional layers for feature extraction, bidirectional LSTM layers for capturing sequential dependencies, and CTC decoding for alignment-free sequence prediction. This integration eliminates the need for explicit segmentation, a complicated task in cursive or densely written scripts. Together, the two stages form a cohesive system in which image enhancement directly supports recognition accuracy. To ensure robustness, the research incorporated extensive dataset preparation and preprocessing. Handwritten datasets for Arabic, Devanagari, and Telugu scripts were selected to reflect structural diversity. Preprocessing included resizing, artificial noise simulation (Gaussian noise, blur, and compression artefacts), and augmentation (rotations, elastic distortions, and brightness adjustments). These techniques increased dataset variability and improved the model's ability to generalize to real-world handwriting scenarios. Evaluation was conducted at both image and recognition levels. Image quality was assessed using the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index (SSIM). At the same time, recognition performance was measured using the Character Error Rate (CER) and the Word Error Rate (WER). This dual evaluation ensured that improvements in image clarity translated into tangible recognition gains. The results confirm the effectiveness of the proposed pipeline. Real-ESRGAN showed improvements compared to SRCNN, VDSR, and ESRGAN, with higher PSNR and SSIM values across scripts. These gains reflect superior structural fidelity, particularly in preserving Arabic cursive flows, Devanagari's horizontal headstrokes, and Telugu's stacked ligatures. Recognition accuracy also improved compared to baseline low-resolution inputs. Script-specific analysis showed more precise word boundaries in Arabic, sharper conjuncts and diacritics in Devanagari, and more distinct glyph separations in Telugu. When benchmarked against traditional OCR systems, such as Tesseract, the pipeline demonstrated clearer recognition outcomes, indicating the critical role of task-specific super-resolution in OCR, proving that enhancing input quality directly strengthens recognition performance. The dissertation makes contributions across methodological, empirical, theoretical, and practical domains. Methodologically, it demonstrates the value of integrating enhancement and recognition stages in a fine-tuned pipeline, ensuring that improvements in image clarity yield measurable gains in recognition. Empirically, it validates the effectiveness of Real-ESRGAN for handwritten text, showing consistent improvements across structurally diverse scripts. Theoretically, it advances the understanding of script-sensitive OCR, emphasizing the preservation of structural features such as diacritics and ligatures. Practically, the work highlights applications in archival preservation, e-governance, and education. By enabling more accurate digitisation of handwritten records, the system supports inclusive access to information and the preservation of linguistic heritage. The study acknowledges several limitations. The scarcity of diverse annotated datasets constrains the model's generalizability to other scripts such as Amharic or Khmer. The computational expense of training Real-ESRGAN limits its feasibility in low-resource settings. Occasional GAN artefacts, where spurious strokes or distortions appear, pose risks in sensitive applications such as legal documents. Moreover, the pipeline has not been extensively tested on mixed-script texts, common in multilingual societies. These limitations suggest avenues for future work, including developing larger and more diverse datasets, designing lightweight models for real-time and mobile deployment, integrating script identification for mixed-language documents, and incorporating explainable AI for greater transparency in recognition decisions. In conclusion, the dissertation demonstrates that GAN-enhanced super-resolution is not merely a cosmetic tool but an essential step toward robust OCR in non-English handwritten texts. By aligning image enhancement with recognition objectives, the proposed pipeline reduces error rates and generalizes across diverse scripts. Its implications extend beyond technical achievement to cultural preservation, digital inclusion, and the democratization of access to information. At the same time, the identified limitations provide a roadmap for future research, ensuring that multilingual OCR evolves into a truly global and inclusive technology.21 0Item Restricted Evaluating the Effectiveness of Existing AI Models in Energy Management for Smart Facilities and Buildings(Saudi Digital Library, 2025) Aldawsari, Abdulrahman; Morgan, PeterThis project evaluates the practical effectiveness of existing artificial intelligence (AI) models used in energy management systems for smart buildings and microgrids. While the academic literature is rich in high-performing algorithms, little is known about how these models function under real-world constraints such as data availability, system integration, and operator interpretability. The research focuses on four main AI model types: deep learning models such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU); tree-based models including Random Forest (RF) and Gradient Boosted Trees (GBT); hybrid models combining convolutional neural networks (CNN) and support vector regression (SVR); and reinforcement learning approaches, particularly Proximal Policy Optimisation (PPO). A structured evaluation framework was developed using three pillars: technical performance, operational feasibility, and deployment readiness. Each model was assessed using peer-reviewed results and case studies, with comparative analysis across forecasting accuracy, training demands, interpretability, and integration ease. The findings revealed that deep learning models, particularly LSTM and GRU, excelled in forecasting accuracy but were resource-intensive and opaque to non-specialist users. Tree-based models such as RF offered greater transparency and were easier to deploy but had lower accuracy in complex, time-dependent scenarios. Hybrid models demonstrated the highest accuracy but required significant tuning and maintenance. PPO-based models were effective in dynamic systems like microgrids but presented challenges with explainability and reward design. Federated learning approaches showed promise in decentralised or privacy-sensitive environments, although the results were mixed and highly context-dependent. Key deployment barriers include data quality gaps, limited technical expertise, and poor interoperability with legacy building management systems. Case studies reinforce the view that no model is universally optimal; effectiveness depends on how well a model aligns with the operational environment. For example, interpretable models may be more suitable in public-sector buildings, while advanced reinforcement learning may be better suited to complex, high-investment infrastructure. The study concludes that successful adoption of AI in energy management requires more than technical optimisation. It demands models that are accurate, explainable, and compatible with the real conditions of the buildings they serve. Recommendations include selecting models based on a balance of accuracy and interpretability, planning for model retraining, addressing integration barriers early, and investing in region-specific validation to ensure broader applicability.7 0Item Restricted Deep Learning based Cancer Classification and Segmentation in Medical Images(Saudi Digital Library, 2025) Alharbi, Afaf; Zhang, QianniCancer has significantly threatened human life and health for many years. In the clinic, medical images analysis is the golden stand for evaluating the prediction of patient prog- nosis and treatment outcome. Generally, manually labelling tumour regions in hundreds of medical images is time- consuming and expensive for pathologists, radiologists and CT scans experts. Recently, the advancements in hardware and computer vision have allowed deep-learning-based methods to become main stream to segment tumours automatically, significantly reducing the workload of healthcare professionals. However, there still remain many challenging tasks towards medical images such as auto- mated cancer categorisation, tumour area segmentation, and relying on large-scale labeled images. Therefore, this research studies theses challenges tasks in medical images proposing novel deep-learning paradigms that can support healthcare professionals in cancer diagnosis and treatment plans. Chapter 3 proposes automated tissue classification framework called Multiple Instance Learning (MIL) in whole slide histology images. To overcome the limitations of weak super- vision in tissue classification, we incorporate the attention mechanism into the MIL frame- work. This integration allows us to effectively address the challenges associated with the inadequate labeling of training data and improve the accuracy and reliability of the tissue classification process. Chapter 4 proposes a novel approach for histopathology image classification with MIL model that combines an adaptive attention mechanism into an end-to-end deep CNN as well as transfer learning pre-trained models (Trans-AMIL). Well-known Transfer Learning architectures of VGGNet [14], DenseNet [15] and ResNet[16] are leverage in our framework implementation. Experiment and deep analysis have been conducted on public histopathol- ogy breast cancer dataset. The results show that our Trans-AMIL proposed approach with VGG pre- trained model demonstrates excellent improvement over the state-of-the-art. Chapter 5 proposes a self-supervised learning for Magnetic resonance imaging (MRI) tu- mour segmentation. A self-supervised cancer segmentation framework is proposed to re- duce label dependency. An innovative Barlow-Twins technique scheme combined with swin transformer is developed to perform this self supervised method in MRI brain medical im- ages. Additionally, data augmentation are applied to improve the discriminability of tumour features. Experimental results show that the proposed method achieves better tumour seg- mentation performance than other popular self- supervised methods. Chapter 6 proposes an innovative Barlow Twins self supervised technique combined with Regularised variational auto-encoder for MRI tumour images as well as CT scans images segmentation task. A self-supervised cancer segmentation framework is proposed to reduce label dependency. An innovative Barlow-Twins technique scheme is developed to represent tumour features based on unlabeled images. Additionally, data augmentation are applied to improve the discriminability of tumour features. Experimental results show that the pro- posed method achieves better tumour segmentation performance than other existing state of the art methods. The thesis presents four approaches for classifying and segmenting cancer images from his- tology images, MRI images and CT scans images: unsupervised, and weakly supervised methods. This research effectively classifies histopathology images tumour regions based on histopathological annotations and well-designed modules. The research additionally comprehensively segments MRI and CT images. Our studies comprehensively demonstrate label-effective automatic on various types of medical image classification and segmentation. Experimental results prove that our works achieve state-of-the-art performances on both classification and segmentation tasks on real world datasets23 0
