Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 3 of 3

Restricted
Transformer-based Semantic Similarity Exploration on the Holy Quran
(Saudi Digital Library, 2025-03) Alsaleh, Abdullah Nassir A; Atwell, Eric; Altahhan, Abdulrahman
This PhD research explores the application of modern Natural Language Processing (NLP) techniques to the study of the Holy Quran, with a focus on semantic understanding. It addresses the challenges of working with Classical Arabic and explores how Transformer-based Arabic language models can be used to better understand relationships between Quranic verses, answer questions, and retrieve relevant passages. The thesis makes four key contributions. First, it evaluates QurSim, a semantic similarity corpus in the Quran, and produces a cleaner version of the QurSim dataset to support more reliable experiments. Second, it applies Arabic pre-trained language models to three semantically related tasks: semantic similarity, question answering and passage retrieval, to demonstrate their potential and limitations in handling religious text. Third, it outlines the methods and strategies for tackling the tasks, identifying the most effective approaches to understanding the Quranic text. The central contribution of the thesis is the development of QuranRel, a newly annotated semantic similarity corpus of the Holy Quran. The corpus addresses key limitations in existing resources, including a lack of expert labellers, multi-verse handling and data quality issues. QuranRel provides 12,937 curated and labelled verses, paying attention to context and meaning. The findings of this PhD thesis demonstrate two main contributions. First, the Arabic pre-trained Transformer-based language models can be effectively applied to Quranic text semantic tasks, although their performance varies depending on the nature of the task. Second, the thesis highlights the need for a new semantic similarity corpus of the Holy Quran, grounded in a Quranic exegesis that interprets the Quran through the Quran itself. These contributions advance the field of NLP for the Holy Quran in particular and Classical Arabic in general, providing tools and resources that open new pathways for the computational linguistics of religious text.
49 0
Restricted
English-Arabic Cross-Language Plagiarism Detection
(2022) Alotaibi, Naif; Joy, Mike
The advancement of the information era and technology has contributed to the rapid growth of digital text libraries and automatic machine translation systems. The machine translation tools facilitate translating texts from one language into another. Those have resulted in increasing the content accessible in different languages, which makes it easy to perform translated plagiarism, which is referred to as “cross-language plagiarism”. Identification of plagiarism amongst texts in different languages is more challenging than recognizing plagiarism within a corpus written in the same language. This research proposes a new framework for enhancing English-Arabic cross-language plagiarism detection at the sentence level. The framework comprises of two phases: the first phase is feature extraction, while the second is plagiarism detection based on a supervised machine learning classification model. Phase one is concerned with extracted features among English-Arabic cross-language sentences, where we propose approaches to extracting sets of features at lexical, semantic and syntactic levels. This phase involves two components. The first relies on translation plus a monolingual, pretrained word embedding model, integrated with term frequency inverse document frequency (TFIDF), and part of speech (POS) scheme methods, as well as word order information. The second component employs a pre-trained multilingual model for determining semantic relatedness between cross-language sentence pairs. In terms of the second phase, we propose to apply and examine using various supervised machine learning classifier methods, along with the extracted features and with combinations of those features to assist in the task of classifying sentences as either plagiarized or non-plagiarized. Each phase was assessed using different datasets. The experimental results for phase one on different benchmark datasets, such as SemEval-2017, show the proposed methods for extracted features achieved improvement when compared against the baselines and other methods. Analysis of experimental data for phase two demonstrates that using extracted features and their combinations with various supervised machine learning classification methods achieves promising results. Ultimately, using the combination of extracted features along with a supervised ensemble machine learning classifier achieves the best classification results.
61 0
Restricted
Artificial intelligence for understanding the Hadith
(2023-01-30) Altammami, Shatha; Atwell, Eric
My research aims to utilize Artificial Intelligence to model the meanings of Classical Arabic Hadith, which are the reports of the life and teachings of the Prophet Muhammad. The goal is to find similarities and relatedness between Hadith and other religious texts, specifically the Quran. These findings can facilitate downstream tasks, such as Islamic question- answering systems, and enhance understanding of these texts to shed light on new interpretations. To achieve this goal, a well-structured Hadith corpus should be created, with the Matn (Hadith teaching) and Isnad (chain of narrators) segmented. Hence, a preliminary task is conducted to build a segmentation tool using machine learning models that automatically deconstruct the Hadith into Isnad and Matn with 92.5% accuracy. This tool is then used to create a well-structured corpus of the canonical Hadith books. After building the Hadith corpus, Matns are extracted to investigate different methods of representing their meanings. Two main methods are tested: a knowledge-based approach and a deep-learning-based approach. To apply the former, existing Islamic ontologies are enumerated, most of which are intended for the Quran. Since the Quran and the Hadith are in the same domain, the extent to which these ontologies cover the Hadith is examined using a corpus-based evaluation. Results show that the most comprehensive Quran ontology covers only 26.8% of Hadith concepts, and extending it is expensive. Therefore, the second approach is investigated by building and evaluating various deep-learning models for a binary classification task of detecting relatedness between the Hadith and the Quran. Results show that the likelihood of the current models reaching a human- level understanding of such texts remains somewhat elusive.
103 0

Saudi Cultural Missions Theses & Dissertations

Browse

Filters

Settings

Sort By

Results per page

Search Results