Rasm: Arabic Handwritten Character Recognition: A Data Quality Approach

dc.contributor.advisorDoctor, Faiyaz
dc.contributor.authorAlghamdi, Tawfeeq
dc.date.accessioned2025-02-13T05:56:28Z
dc.date.issued2024
dc.descriptionThis Master's thesis, Rasm: Arabic Handwritten Character Recognition: A Data Quality Approach investigates how data quality influences Arabic Handwritten Character Recognition systems using deep learning methodologies. The study targets enhancing CNN performance on noisy Arabic children's handwriting through advanced data preprocessing and augmentation methods. The research examines model accuracy effects through different data-centric techniques using the Hijja dataset which includes handwritten Arabic children's characters aged between 7 and 12. The study demonstrates a significant enhancement in model accuracy from 85% to 96% through the employment of specialized data augmentation and filtering strategies. This thesis makes advancements in Machine Learning along with Optical Character Recognition (OCR) and Pattern Recognition by investigating Arabic script processing difficulties and emphasizing the need for superior training data in AI handwriting recognition systems. The results of this study are essential for educational technology because systems for automated grading and digitization tasks will benefit from improved handwriting recognition methods to enhance accessibility. AI text recognition development and the enhancement of AI model optimization benefit from this study which also promotes more reliable recognition systems for diverse scripts and demographic groups. Future developments should apply these methods to different languages and recognition tasks to push forward the progress of handwriting digitization technology.
dc.description.abstractThe problem of AHCR is a challenging one due to the complexities of the Arabic script, and the variability in handwriting (especially for children). In this context, we present ‘Rasm’, a data quality approach that can significantly improve the result of AHCR problem, through a combination of preprocessing, augmentation, and filtering techniques. We use the Hijja dataset, which consists of samples from children from age 7 to age 12, and by applying advanced preprocessing steps and label-specific targeted augmentation, we achieve a significant improvement of a CNN performance from 85% to 96%. The key contribution of this work is to shed light on the importance of data quality for handwriting recognition. Despite the recent advances in deep learning, our result reveals the critical role of data quality in this task. The data-centric approach proposed in this work can be useful for other recognition tasks, and other languages in the future. We believe that this work has an important implication on improving AHCR systems for an educational context, where the variability in handwriting is high. Future work can extend the proposed techniques to other scripts and recognition tasks, to further improve the optical character recognition field.
dc.format.extent43
dc.identifier.citationT. Alghamdi, Rasm: Arabic Handwritten Character Recognition: A Data Quality Approach, Master’s thesis, University of Essex, School of Computer Science and Electronic Engineering, 2024.
dc.identifier.urihttps://hdl.handle.net/20.500.14154/74869
dc.language.isoen
dc.publisherUniversity of Essex
dc.subjectArabic Handwritten Character Recognition (AHCR)
dc.subjectOptical Character Recognition (OCR)
dc.subjectMachine Learning
dc.subjectDeep Learning
dc.subjectConvolutional Neural Networks (CNNs)
dc.subjectData Quality in Machine Learning
dc.subjectData Augmentation
dc.subjectData Preprocessing
dc.subjectArabic Script Processing
dc.subjectHandwriting Recognition
dc.subjectEducational Technology
dc.subjectPattern Recognition
dc.subjectArtificial Intelligence (AI)
dc.subjectImage Processing
dc.subjectArabic Handwriting Analysis
dc.titleRasm: Arabic Handwritten Character Recognition: A Data Quality Approach
dc.typeThesis
sdl.degree.departmentSchool of Computer Science and Electronic Engineering
sdl.degree.disciplineComputer Science, specifically in the fields of Machine Learning, Optical Character Recognition (OCR), and Data Quality Enhancement for Arabic Handwritten Character Recognition (AHCR)
sdl.degree.grantorUniversity of Essex
sdl.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ٍSACM-Dissertation.pdf
Size:
1.69 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025