Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    ItemRestricted
    English-Arabic Cross-Language Plagiarism Detection
    (2022) Alotaibi, Naif; Joy, Mike
    The advancement of the information era and technology has contributed to the rapid growth of digital text libraries and automatic machine translation systems. The machine translation tools facilitate translating texts from one language into another. Those have resulted in increasing the content accessible in different languages, which makes it easy to perform translated plagiarism, which is referred to as “cross-language plagiarism”. Identification of plagiarism amongst texts in different languages is more challenging than recognizing plagiarism within a corpus written in the same language. This research proposes a new framework for enhancing English-Arabic cross-language plagiarism detection at the sentence level. The framework comprises of two phases: the first phase is feature extraction, while the second is plagiarism detection based on a supervised machine learning classification model. Phase one is concerned with extracted features among English-Arabic cross-language sentences, where we propose approaches to extracting sets of features at lexical, semantic and syntactic levels. This phase involves two components. The first relies on translation plus a monolingual, pretrained word embedding model, integrated with term frequency inverse document frequency (TFIDF), and part of speech (POS) scheme methods, as well as word order information. The second component employs a pre-trained multilingual model for determining semantic relatedness between cross-language sentence pairs. In terms of the second phase, we propose to apply and examine using various supervised machine learning classifier methods, along with the extracted features and with combinations of those features to assist in the task of classifying sentences as either plagiarized or non-plagiarized. Each phase was assessed using different datasets. The experimental results for phase one on different benchmark datasets, such as SemEval-2017, show the proposed methods for extracted features achieved improvement when compared against the baselines and other methods. Analysis of experimental data for phase two demonstrates that using extracted features and their combinations with various supervised machine learning classification methods achieves promising results. Ultimately, using the combination of extracted features along with a supervised ensemble machine learning classifier achieves the best classification results.
    36 0
  • Thumbnail Image
    ItemRestricted
    Artificial intelligence for understanding the Hadith
    (2023-01-30) Altammami, Shatha; Atwell, Eric
    My research aims to utilize Artificial Intelligence to model the meanings of Classical Arabic Hadith, which are the reports of the life and teachings of the Prophet Muhammad. The goal is to find similarities and relatedness between Hadith and other religious texts, specifically the Quran. These findings can facilitate downstream tasks, such as Islamic question- answering systems, and enhance understanding of these texts to shed light on new interpretations. To achieve this goal, a well-structured Hadith corpus should be created, with the Matn (Hadith teaching) and Isnad (chain of narrators) segmented. Hence, a preliminary task is conducted to build a segmentation tool using machine learning models that automatically deconstruct the Hadith into Isnad and Matn with 92.5% accuracy. This tool is then used to create a well-structured corpus of the canonical Hadith books. After building the Hadith corpus, Matns are extracted to investigate different methods of representing their meanings. Two main methods are tested: a knowledge-based approach and a deep-learning-based approach. To apply the former, existing Islamic ontologies are enumerated, most of which are intended for the Quran. Since the Quran and the Hadith are in the same domain, the extent to which these ontologies cover the Hadith is examined using a corpus-based evaluation. Results show that the most comprehensive Quran ontology covers only 26.8% of Hadith concepts, and extending it is expensive. Therefore, the second approach is investigated by building and evaluating various deep-learning models for a binary classification task of detecting relatedness between the Hadith and the Quran. Results show that the likelihood of the current models reaching a human- level understanding of such texts remains somewhat elusive.
    68 0

Copyright owned by the Saudi Digital Library (SDL) © 2025