Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
8 results
Search Results
Item Restricted AI-Driven Approaches for Privacy Compliance: Enhancing Adherence to Privacy Regulations(Univeristy of Warwick, 2024-02) Alamri, Hamad; Maple, CarstenThis thesis investigates and explores some inherent limitations within the current privacy policy landscape, provides recommendations, and proposes potential solutions to address these issues. The first contribution of this thesis is a comprehensive study that addresses a significant gap in the literature. This study provides a detailed overview of the current landscape of privacy policies, covering both their limitations and proposed solutions, with the aim of identifying the most practical and applicable approaches for researchers in the field. Second, the thesis tackles the challenge of privacy policy accessibility in app stores by introducing the App Privacy Policy Extractor (APPE) system. The APPE pipeline consists of various components, each developed to perform a specific task and provide insightful information about the apps' privacy policies. By analysing over two million apps in the iOS App Store, APPE offers unprecedented and comprehensive store-wide insights into policy distribution and can act as a mechanism for enforcing privacy policy requirements in app stores automatically. Third, the thesis investigates the issue of privacy policy complexity. By establishing generalisability across app categories and drawing attention to associated matters of time and cost, the study demonstrates that the current situation requires immediate and effective solutions. It suggests several recommendations and potential solutions. Finally, to enhance user engagement with privacy policies, a novel framework utilising a cost-effective unsupervised approach, based on the latest AI innovations, has been developed. The comparison of the findings of this study with state-of-the-art methods suggests that this approach can produce outcomes that are on par with those of human experts, or even surpass them, yet in a more efficient and automated manner.21 0Item Restricted Evaluating CAMeL-BERT for Sentiment Analysis of Customer Satisfaction with STC (Saudi Telecom Company) Services(The University of Sussex, 2024-08-15) Alotaibi, Fahad; Pay, JackIn the age of informatics platforms such as Twitter (X) plays a crucial role for measuring public sentiment, especially in both private and public sectors. This study explores the application of machine learning, particularly deep learning, to perform sentiment analysis on tweets about Saudi Telecom Company (STC) services in Saudi Arabia. A comparative analysis was conducted between pre-trained sentiment analysis models in English and in Arabic to assess their effectiveness in classifying sentiments. In addition, the study highlights a challenge in existing Arabic models, which are based on English model architectures but trained on varied datasets, such as Modern Standard Arabic and Classical Arabic (Al-Fus’ha). These models often lack the capability to handle the diverse Arabic dialects commonly used on social media. To overcome this issue, the study involved fine-tuning a pre-trained Arabic model using a dataset of tweets related to STC services, specifically focusing on the Saudi dialect. Data was collected from Twitter (X), focusing on mentions of the Saudi Telecom Company (STC). Both English and Arabic models were applied to this data, and their performance in sentiment analysis was evaluated. The fine-tuned Arabic model (CAMeL-BERT) demonstrated improved accuracy and a better understanding of local dialects compared to its initial version. The results highlight the importance of model adaptation for specific languages and contexts and underline the potential of CAMeL-BERT in sentiment analysis for Arabic-language content. The findings offer practical implications for enhancing customer service and engagement through more accurate sentiment analysis of social media content in the service providers sector.15 0Item Restricted Exploring Malnutrition in Residential Aged Care: A Study on Nursing Notes using Natural Language Processing and Large Language Models(University of Wollongong, 2024-03-21) Alkhalaf, Mohammad; Yu, PingPopulation ageing has led to an increasing demand for services for the older people. Residential aged care facilities (RACFs) in Australia provide a range of services for older people who can no longer live independently at home. These include accommodation, personal care, health care services and social and emotional support. Despite efforts for comprehensive care, managing nutrition for older people has been complex in RACFs. Malnutrition has emerged as a prevalent issue within these facilities, raising serious health concerns. Therefore, understanding and addressing malnutrition becomes a critical concern for the Australian government. To date, there has been a reliance on nutrition screening tools to assess older people’s nutritional care needs. Conducting these assessments require adequate healthcare training, and is time consuming, thus are not implemented as frequently as needed to timely uncover the risk of malnutrition for older people. In Australia, the majority of RACFs have established electronic health record (EHRs) system to capture and record care recipients’ information. These include medical diagnosis, regular nursing assessment, weight chart, care plan, periodic review, incident and infection review, and nursing progress report. Therefore, RAC EHRs contain wealth of information that can be mined to support aged care services. The advancement in natural language processing (NLP) technologies, in specific, large language models (LLMs), provides an opportunity to uncover useful insight from the RAC EHRs. Therefore, this PhD research is dedicated to extend NLP technology to the under-studied area RAC, design, implement and evaluate LLM applications in nutrition management among older individuals living in RACFs. It aims to design and develop a sophisticated machine learning framework capable of analysing both structured and unstructured EHR data to gain comprehensive insights into the malnutrition issue. Drawing from literature insights, the study initiates by employing word embedding techniques integrating with cosine similarity and UMLS ontology to extract nutrition- related terms from nursing notes in RACFs. This led to the uncover of language style and terminology used by the practicing nursing and aged care workers in manage nutrition for the older people under their care. Subsequent development of 13 extraction rules identifies relevant notes indicative of malnutrition, forming the basis for a training data set of 2,278 relevant nursing notes, which is utilized in LLM implementation. To enhance the LLM understanding of nursing notes, we randomly selected 500,000 notes for pre-training a domain specific LLM based on the established RoBERTa model. This is followed by fine-tuning the LLM specifically for malnutrition note detection. Achieving an impressive F1-score of 0.96, our model significantly surpassed previous models, ensuring more accurate classification of notes documenting malnutrition. Furthermore, we developed a framework integrating generative LLM, Llama 2, and retrieval augmented generation (RAG) system to extract comprehensive summary information from malnutrition-related notes. This framework demonstrates high accuracy (90%) in identifying malnutrition risk factors from 1,399 notes. It generates detailed summaries about nutrition status from EHRs with 99% of accuracy. Our study reveals a malnutrition prevalence rate of approximately 33% in the studied RACFs. There are 15 main categories and 43 subcategories of malnutrition risk factors. For the first time, this research identified the primary risk factors of malnutrition in RACFs, including poor appetite that affects 17% of older people. This is followed by insufficient oral intake and dementia progression. To enhance malnutrition predictive capabilities, we fine-tuned the RAC domain specific model to address the sequence length limitation of the RoBERTa model, 512 tokens. This is achieved by extending the sequence length to support 1,536 tokens. Augmented with risk factors, our model achieved an F1-score of 0.687, demonstrating its effectiveness in predicting malnutrition risk one month before the event onset. In conclusion, this research designs, develops and evaluates an innovative AI framework that leverages advanced AI technologies, particularly NLP and domain- specific LLMs, to tackle malnutrition among older people in residential aged care facilities. By analysing text data in EHR, The AI framework identifies risk factors, summarises nutrition information, and predict malnutrition one-month before the event onset. After thorough evaluation by domain experts, the AI framework can be implemented as an automated assessment tool. Its implementation into aged care services will alleviate the time burden associated with nutrition care for health and aged care practitioners, supporting them in identifying risk factors of malnutrition for the old people under their care, and manage malnutrition efficiently. The framework’s scalability extends beyond residential aged care facilities. It can be further extended to other healthcare settings to improve nutrition care effectiveness and quality of life for consumers.49 0Item Restricted Synonym-based Adversarial Attacks in Arabic Text Classification Systems(Clarkson University, 2024-05-21) Alshahrani, Norah Falah S; Matthews, JeannaText classification systems have been proven vulnerable to adversarial text examples, modified versions of the original text examples that are often unnoticed by human eyes, yet can force text classification models to alter their classification. Often, research works quantifying the impact of adversarial text attacks have been applied only to models trained in English. In this thesis, we introduce the first word-level study of adversarial attacks in Arabic. Specifically, we use a synonym (word-level) attack using a Masked Language Modeling (MLM) task with a BERT model in a black-box setting to assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. To evaluate the grammatical and semantic similarities of the newly produced adversarial examples using our synonym BERT-based attack, we invite four human evaluators to assess and compare the produced adversarial examples with their original examples. We also study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms against these adversarial examples on the BERT models. We find that fine-tuned BERT models were more susceptible to our synonym attacks than the other Deep Neural Networks (DNN) models like WordCNN and WordLSTM we trained. We also find that fine-tuned BERT models were more susceptible to transferred attacks. We, lastly, find that fine-tuned BERT models successfully regain at least 2% in accuracy after applying adversarial training as an initial defense mechanism. We share our code scripts and trained models on GitHub at https://github.com/NorahAlshahrani/bert_synonym_attack.37 0Item Restricted EXPLORING LANGUAGE MODELS AND QUESTION ANSWERING IN BIOMEDICAL AND ARABIC DOMAINS(University of Delaware, 2024-05-10) Alrowili, Sultan; Shanker, K.VijayDespite the success of the Transformer model and its variations (e.g., BERT, ALBERT, ELECTRA, T5) in addressing NLP tasks, similar success is not achieved when these models are applied to specific domains (e.g., biomedical) and limited-resources language (e.g., Arabic). This research addresses issues to overcome some challenges in the use of Transformer models to specialized domains and languages that lack in language processing resources. One of the reasons for reduced performance in limited domains might be due to the lack of quality contextual representations. We address this issue by adapting different types of language models and introducing five BioM-Transformer models for the biomedical domain and Funnel transformer and T5 models for the Arabic language. For each of our models, we present experiments for studying the impact of design factors (e.g., corpora and vocabulary domain, model-scale, architecture design) on performance and efficiency. Our evaluation of BioM-Transformer models shows that we obtain state-of-the-art results on several biomedical NLP tasks and achieved the top-performing models on the BLURB leaderboard. The evaluation of our small scale Arabic Funnel and T5 models shows that we achieve comparable performance while utilizing less computation compared to the fine tuning cost of existing Arabic models. Further, our base-scale Arabic language models extend state-of-the-art results on several Arabic NLP tasks while maintaining a comparable fine-tuning cost to existing base-scale models. Next, we focus on the question-answering task, specifically tackling issues in specialized domains and low-resource languages such as the limited size of question-answering datasets and limited topics coverage within them. We employ several methods to address these issues in the biomedical domain, including the employment of models adapted to the domain and Task-to-Task Transfer Learning. We evaluate the effectiveness of these methods at the BioASQ10 (2022) challenge, showing that we achieved the top-performing system on several batches of the BioASQ10 challenge. In Arabic, we address similar existing issues by introducing a novel approach to create question-answer-passage triplets, and propose a pipeline, Pair2Passage, to create large QA datasets. Using this method and the pipeline, we create the ArTrivia dataset, a new Arabic question-answering dataset comprising more than +10,000 high-quality question-answer-passage triplets. We presented a quantitative and qualitative analysis of ArTrivia that shows the importance of some often overlooked yet important components, such as answer normalization in enhancing the quality of the question-answer dataset and future annotation. In addition, our evaluation shows the ability of ArTrivia to build a question-answering model that can address the out-of-distribution issue in existing Arabic QA datasets.21 0Item Restricted Unsupervised Semantic Change Detection in Arabic(Queen Mary University of London, 2023-10-23) Sindi, Kenan; Dubossarsky, HaimThis study employs pretrained BERT models— AraBERT, CAMeLBERT (CA), and CAMeLBERT (MSA)—to investigate semantic change in Arabic across distinct time periods. Analyzing word embeddings and cosine distance scores reveals variations in capturing semantic shifts. The research highlights the significance of training data quality and diversity, while acknowledging limitations in data scope. The project's outcome—a list of most stable and changed words—contributes to Arabic NLP by shedding light on semantic change detection, suggesting potential model selection strategies and areas for future exploration.89 0Item Restricted Improving vulnerability description using natural language generation(Saudi Digital Library, 2023-10-25) Althebeiti, Hattan; Mohaisen, DavidSoftware plays an integral role in powering numerous everyday computing gadgets. As our reliance on software continues to grow, so does the prevalence of software vulnerabilities, with significant implications for organizations and users. As such, documenting vulnerabilities and tracking their development becomes crucial. Vulnerability databases addressed this issue by storing a record with various attributes for each discovered vulnerability. However, their contents suffer several drawbacks, which we address in our work. In this dissertation, we investigate the weaknesses associated with vulnerability descriptions in public repositories and alleviate such weaknesses through Natural Language Processing (NLP) approaches. The first contribution examines vulnerability descriptions in those databases and approaches to improve them. We propose a new automated method leveraging external sources to enrich the scope and context of a vulnerability description. Moreover, we exploit fine-tuned pretrained language models for normalizing the resulting description. The second contribution investigates the need for uniform and normalized structure in vulnerability descriptions. We address this need by breaking the description of a vulnerability into multiple constituents and developing a multi-task model to create a new uniform and normalized summary that maintains the necessary attributes of the vulnerability using the extracted features while ensuring a consistent vulnerability description. Our method proved effective in generating new summaries with the same structure across a collection of various vulnerability descriptions and types. Our final contribution investigates the feasibility of assigning the Common Weakness Enumeration (CWE) attribute to a vulnerability based on its description. CWE offers a comprehensive framework that categorizes similar exposures into classes, representing the types of exploitation associated with such vulnerabilities. Our approach utilizing pre-trained language models is shown to outperform Large Language Model (LLM) for this task. Overall, this dissertation provides various technical approaches exploiting advances in NLP to improve publicly available vulnerability databases.10 0Item Restricted Hate Speech Detection for the Arabic Language(Saudi Digital Library, 2023-11-03) Alhejaili, Abrar; Moosavi, NafiseAs online social networks grow and communication technologies become more available, people can exercise their freedom of expression more than ever before. Even though the interaction between users on these platforms can be constructive, they are increasingly used for spreading hateful content, mainly due to the anonymity feature of these online platforms. Hate speech can induce cyber conflict, negatively impacting social life at both the individual and national levels. In spite of this, social network providers are unable to monitor all the content posted by their users. As a result, there is a need to detect hate speech automatically. This need increases when the text is written in a language like Arabic. Arabic is known for its challenges, complexities, and resource scarcity. This project uses transfer learning methods to adapt, and evaluate some pretrained models to detect hate speech in Arabic. Many experiments were conducted in this project to assess the transferring of some options from BERT and Sequence-to-Sequence families (e.g., DehateBERT, MARBERT, T5, and Flan-T5), and the transferring of preprocessing functions from a pretrained model (AraBERT). Experiments show that transfer learning by finetuning monolingual models has promising results to a different extent. In addition, the additional preprocessing can affect the performance in a good way. Nevertheless, dealing with low-frequency labels independently, such as our dataset’s hate class, is still challenging. Warning: This paper may include instances of offensive language.17 0