Automated Dictionary Construction from Arabic Quran Corpus

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Several studies have shown that concordancer tools are beneficial for language learning and teaching, as well as developing automated dictionaries. Although the extent of the Arabic corpus linguistics availability has been raised, the Arabic language lacks to sophisticated concordance tools and automated dictionaries applications which help in learning language and analyzing the text. Concordance search is a crucial task in the field of Arabic Natural Language Processing and Computational Linguistics that supports language learning, as well as facilitates the learning of collocations, vocabulary, and their usage and writing styles. However, the Information Retrieval performance in the Arabic language is highly problematic due to the particular structural and morphological changes such as inflected derived and irregular forms, polysemy, different writing of combination of individual characters, and different spellings of certain words. This project aims to combine several Natural Language Processing techniques to build a corpus-based application software that provides a concordances search for the Arabic language using the Quran corpus that the best represents the Classical Arabic Form. In addition, an overview of the possibilities and potential difficulties that should be considered when analyzing the Arabic natural language input are also presented. This software includes various functionalities that perform well-defined tasks such as text pre-processing, stemming process, building the words dictionary, building the roots dictionary, a concordances search based on a specific word, a concordances search based on a root of the words, finding all derived words of the root word, displaying the most frequent words, generating the words cloud of the Quran and constructing the concordances dataset. The software provides the users with different lexicon information, namely root of the word, derived words, and their concordance verses with its interpretation. In this study, the developed concordancer tool, and the constructed dataset provide a valuable help to the language learners, researchers, and developers. This study contributes to the lack of research in this field. Moreover, it identifies the Arabic Natural Language Processing challenges and it could motivate future researches in various other relevant fields.