Transformer-based Semantic Similarity Exploration on the Holy Quran

No Thumbnail Available

Date

2025-03

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

This PhD research explores the application of modern Natural Language Processing (NLP) techniques to the study of the Holy Quran, with a focus on semantic understanding. It addresses the challenges of working with Classical Arabic and explores how Transformer-based Arabic language models can be used to better understand relationships between Quranic verses, answer questions, and retrieve relevant passages. The thesis makes four key contributions. First, it evaluates QurSim, a semantic similarity corpus in the Quran, and produces a cleaner version of the QurSim dataset to support more reliable experiments. Second, it applies Arabic pre-trained language models to three semantically related tasks: semantic similarity, question answering and passage retrieval, to demonstrate their potential and limitations in handling religious text. Third, it outlines the methods and strategies for tackling the tasks, identifying the most effective approaches to understanding the Quranic text. The central contribution of the thesis is the development of QuranRel, a newly annotated semantic similarity corpus of the Holy Quran. The corpus addresses key limitations in existing resources, including a lack of expert labellers, multi-verse handling and data quality issues. QuranRel provides 12,937 curated and labelled verses, paying attention to context and meaning. The findings of this PhD thesis demonstrate two main contributions. First, the Arabic pre-trained Transformer-based language models can be effectively applied to Quranic text semantic tasks, although their performance varies depending on the nature of the task. Second, the thesis highlights the need for a new semantic similarity corpus of the Holy Quran, grounded in a Quranic exegesis that interprets the Quran through the Quran itself. These contributions advance the field of NLP for the Holy Quran in particular and Classical Arabic in general, providing tools and resources that open new pathways for the computational linguistics of religious text.

Description

Keywords

Quran, Semantic Similarity, Question Answering, Passage Retrieval, Classical Arabic

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025