Xia, PanqiuShahbi, Bilqis2026-01-252025Shahbi, B. (2025). GAN-Enhanced Super-Resolution Pipeline for Robust Word Recognition in Low-Quality Non-English Handwriting. Master’s thesis, Cardiff University.https://hdl.handle.net/20.500.14154/78027A deep learning framework integrating image enhancement and handwritten text recognition.Executive summary The dissertation tackles a critical issue where current optical character recognition (OCR) technologies fall short: correctly identifying handwritten non-English scripts in poor quality and deteriorated settings. While OCR technologies have matured for printed English and other Latin-based languages, scripts such as Arabic, Devanagari, and Telugu remain underrepresented due to structural complexities, cursive connections, diacritics, ligatures, and the limited availability of annotated datasets. These challenges are amplified by real-world factors such as low-resolution scans, noisy archival documents, and mobile phone captures, where fine details necessary for recognition are lost. The study proposes a two-stage deep learning pipeline that integrates super-resolution with recognition, explicitly designed to address these shortcomings. The first stage of the pipeline utilises Real-ESRGAN, a generative adversarial network specifically optimised for real-world image degradation. Unlike earlier models such as SRCNN, VDSR, and ESRGAN, which often prioritize aesthetics or hallucinate features, Real-ESRGAN reconstructs high-resolution images with sharper strokes, preserved ligatures, and clear diacritics. Its Residual-in-Residual Dense Block (RRDB) architecture combines residual learning and dense connections, enabling robust recovery of fine-grained textual details. By preserving structural fidelity rather than merely visual appeal, Real-ESRGAN ensures that enhanced images retain the critical features necessary for recognition. The second stage utilises a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) loss function. The CRNN combines convolutional layers for feature extraction, bidirectional LSTM layers for capturing sequential dependencies, and CTC decoding for alignment-free sequence prediction. This integration eliminates the need for explicit segmentation, a complicated task in cursive or densely written scripts. Together, the two stages form a cohesive system in which image enhancement directly supports recognition accuracy. To ensure robustness, the research incorporated extensive dataset preparation and preprocessing. Handwritten datasets for Arabic, Devanagari, and Telugu scripts were selected to reflect structural diversity. Preprocessing included resizing, artificial noise simulation (Gaussian noise, blur, and compression artefacts), and augmentation (rotations, elastic distortions, and brightness adjustments). These techniques increased dataset variability and improved the model's ability to generalize to real-world handwriting scenarios. Evaluation was conducted at both image and recognition levels. Image quality was assessed using the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index (SSIM). At the same time, recognition performance was measured using the Character Error Rate (CER) and the Word Error Rate (WER). This dual evaluation ensured that improvements in image clarity translated into tangible recognition gains. The results confirm the effectiveness of the proposed pipeline. Real-ESRGAN showed improvements compared to SRCNN, VDSR, and ESRGAN, with higher PSNR and SSIM values across scripts. These gains reflect superior structural fidelity, particularly in preserving Arabic cursive flows, Devanagari's horizontal headstrokes, and Telugu's stacked ligatures. Recognition accuracy also improved compared to baseline low-resolution inputs. Script-specific analysis showed more precise word boundaries in Arabic, sharper conjuncts and diacritics in Devanagari, and more distinct glyph separations in Telugu. When benchmarked against traditional OCR systems, such as Tesseract, the pipeline demonstrated clearer recognition outcomes, indicating the critical role of task-specific super-resolution in OCR, proving that enhancing input quality directly strengthens recognition performance. The dissertation makes contributions across methodological, empirical, theoretical, and practical domains. Methodologically, it demonstrates the value of integrating enhancement and recognition stages in a fine-tuned pipeline, ensuring that improvements in image clarity yield measurable gains in recognition. Empirically, it validates the effectiveness of Real-ESRGAN for handwritten text, showing consistent improvements across structurally diverse scripts. Theoretically, it advances the understanding of script-sensitive OCR, emphasizing the preservation of structural features such as diacritics and ligatures. Practically, the work highlights applications in archival preservation, e-governance, and education. By enabling more accurate digitisation of handwritten records, the system supports inclusive access to information and the preservation of linguistic heritage. The study acknowledges several limitations. The scarcity of diverse annotated datasets constrains the model's generalizability to other scripts such as Amharic or Khmer. The computational expense of training Real-ESRGAN limits its feasibility in low-resource settings. Occasional GAN artefacts, where spurious strokes or distortions appear, pose risks in sensitive applications such as legal documents. Moreover, the pipeline has not been extensively tested on mixed-script texts, common in multilingual societies. These limitations suggest avenues for future work, including developing larger and more diverse datasets, designing lightweight models for real-time and mobile deployment, integrating script identification for mixed-language documents, and incorporating explainable AI for greater transparency in recognition decisions. In conclusion, the dissertation demonstrates that GAN-enhanced super-resolution is not merely a cosmetic tool but an essential step toward robust OCR in non-English handwritten texts. By aligning image enhancement with recognition objectives, the proposed pipeline reduces error rates and generalizes across diverse scripts. Its implications extend beyond technical achievement to cultural preservation, digital inclusion, and the democratization of access to information. At the same time, the identified limitations provide a roadmap for future research, ensuring that multilingual OCR evolves into a truly global and inclusive technology.60enHandwritten Text RecognitionOptical Character Recognition (OCR)Deep LearningConvolutional Neural Networks (CNN)Convolutional Recurrent Neural Networks (CRNN)Text Image Super-ResolutionGenerative Adversarial Networks (GAN)Multilingual Text RecognitionDigital Image ProcessingText Recognition Enhancement.GAN-Enhanced Super-Resolution Pipeline for Robust Word Recognition in Low-Quality Non-English Handwriting.Thesis