Real-word error detection and correction in Arabic text
No Thumbnail Available
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Spell checking is the process of finding misspelled words and possibly correcting them. Spell checkers are important tools for document preparation, word processing, searching, and document retrieval. The task of detecting and correcting misspelled words in a text is challenging. Most of the modern commercial spell checkers work on word level with the possibility of detecting and correcting non-word errors. However, few of them use techniques to work on real-word errors. This is one of the challenging problems in text processing. Moreover, most of the proposed techniques so far are on Latin script languages. However, Arabic language has not received much interest, especially for real-word errors.
In this thesis we address the problem of real-word errors using context words and n-gram language models. We implemented an unsupervised model for real-word error detection and correction for Arabic text in which N-gram language models are used. Supervised models are also implemented that use confusion sets to detect and correct real-word errors. In the supervised models, a window based technique is used to estimate the probabilities of the context words of the confusion sets. N-gram language models are also used to detect real-word errors by examining the sequences of n words. The same language models are also used to choose the best correction for the detected errors. The experimental results of the prototypes showed promising correction accuracy. However, it is not possible to compare our results with other published works as there is no benchmarking dataset for real-word errors correction for Arabic text. In addition, conclusions and future directions are also presented.