Document analysis and indexing of Arabic manuscripts
No Thumbnail Available
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Arabic and Islamic-related manuscripts represent a rich source of knowledge that has been highly underutilized. Many Islamic manuscripts and scriptures, of great historical values, are still shelved in various national, libraries and universities in Saudi Arabia (across the Arab and Muslim worlds). These precious historical artifacts are yet to be typeset and published in book-form. Although there are some Islamic manuscripts which are present in digital form, they are generally not indexed for retrieval purposes. Given the vast content of these manuscripts, it is utmost desirable to develop indexing and retrieval system. There are various approaches proposed for content based retrieval of images, videos, and audios using low-level, intermediate-level or high-level features. Our main focus in this thesis is to propose a framework of retrieval and indexing for hand written Islamic manuscripts which are stored in the form of document images.
Computer aided retrieval and indexing system for document images is proposed and implemented. Some preprocessing steps are first implemented to process document images such as binarization, noise filtering and segmentation in order to enhance the quality of document images. Second a set of features is extracted from, user identified words for similarity matching purposes. Finally employing the concept of a word book, the manuscript is indexed for later efficient retrieval. The prototype system has been implemented and thoroughly tested for performance and found to be able to retrieve similar words stored in database.