Doctor, FaiyazAlghamdi, Tawfeeq2025-02-132024T. Alghamdi, Rasm: Arabic Handwritten Character Recognition: A Data Quality Approach, Master’s thesis, University of Essex, School of Computer Science and Electronic Engineering, 2024.https://hdl.handle.net/20.500.14154/74869This Master's thesis, Rasm: Arabic Handwritten Character Recognition: A Data Quality Approach investigates how data quality influences Arabic Handwritten Character Recognition systems using deep learning methodologies. The study targets enhancing CNN performance on noisy Arabic children's handwriting through advanced data preprocessing and augmentation methods. The research examines model accuracy effects through different data-centric techniques using the Hijja dataset which includes handwritten Arabic children's characters aged between 7 and 12. The study demonstrates a significant enhancement in model accuracy from 85% to 96% through the employment of specialized data augmentation and filtering strategies. This thesis makes advancements in Machine Learning along with Optical Character Recognition (OCR) and Pattern Recognition by investigating Arabic script processing difficulties and emphasizing the need for superior training data in AI handwriting recognition systems. The results of this study are essential for educational technology because systems for automated grading and digitization tasks will benefit from improved handwriting recognition methods to enhance accessibility. AI text recognition development and the enhancement of AI model optimization benefit from this study which also promotes more reliable recognition systems for diverse scripts and demographic groups. Future developments should apply these methods to different languages and recognition tasks to push forward the progress of handwriting digitization technology.The problem of AHCR is a challenging one due to the complexities of the Arabic script, and the variability in handwriting (especially for children). In this context, we present ‘Rasm’, a data quality approach that can significantly improve the result of AHCR problem, through a combination of preprocessing, augmentation, and filtering techniques. We use the Hijja dataset, which consists of samples from children from age 7 to age 12, and by applying advanced preprocessing steps and label-specific targeted augmentation, we achieve a significant improvement of a CNN performance from 85% to 96%. The key contribution of this work is to shed light on the importance of data quality for handwriting recognition. Despite the recent advances in deep learning, our result reveals the critical role of data quality in this task. The data-centric approach proposed in this work can be useful for other recognition tasks, and other languages in the future. We believe that this work has an important implication on improving AHCR systems for an educational context, where the variability in handwriting is high. Future work can extend the proposed techniques to other scripts and recognition tasks, to further improve the optical character recognition field.43enArabic Handwritten Character Recognition (AHCR)Optical Character Recognition (OCR)Machine LearningDeep LearningConvolutional Neural Networks (CNNs)Data Quality in Machine LearningData AugmentationData PreprocessingArabic Script ProcessingHandwriting RecognitionEducational TechnologyPattern RecognitionArtificial Intelligence (AI)Image ProcessingArabic Handwriting AnalysisRasm: Arabic Handwritten Character Recognition: A Data Quality ApproachThesis