Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 4 of 4
  • ItemRestricted
    Early Prediction of Cancer Using Supervised Machine Learning: A Study of Electronic Health Records From The Ministry of National Gurad Health Affairs
    (University College London (UCL), 2024-08) Alfayez, Asma; Lai, Alvina; Kunz, Holger
    Early detection and treatment of cancer can save lives; however, identifying those most at risk of developing cancer remains challenging. Electronic health records (EHR) provide a rich source of "big" data on large patient numbers. I hypothesised that in the period preceding a definitive cancer diagnosis, there exist healthcare events, such as a history of disease, captured within EHR data that characterise cancer progression and can be exploited to predict future cancer occurrence. Using longitudinal phenotype data from the EHR of the Ministry of National Guard Health Affairs, a large healthcare provider in Saudi Arabia, I aimed to discover health event patterns present in EHR data that predict cancer development in periods prior to diagnosis by developing predictive models using supervised machine learning (ML) algorithms. I used two different prediction periods: six months and one year prior to cancer diagnosis. Initially, the thesis focused on the prediction of both malignant and benign neoplasms, before moving on to predicting the future risk of malignant neoplasms (cancer), since predicting life-threatening illness remains the most important clinical challenge. To refine the approach for specific cancer types, predictive models were built for the top three malignancies in this population: breast, colon, and thyroid cancers. ML predictive models were developed using the following algorithms: (1) logistic regression; (2) penalised logistic regression; (3) decision trees; (4) random forests; (5) gradient boosting; (6) extreme gradient boosting; (7) k-nearest neighbours; and (8) support vector machine. Model performance was assessed using k-fold cross-validation and area under the curve—receiver operating characteristics (AUC-ROC). After developing different models, their performance was compared with and without hyperparameter tuning using tree-based pipeline optimization (TPOT) and GridSearch. This study provides novel proof-of-principle that ML algorithms can be applied to EHR data to develop models that can be used to predict future cancer occurrence.
    26 0
  • ItemRestricted
    Evaluating CAMeL-BERT for Sentiment Analysis of Customer Satisfaction with STC (Saudi Telecom Company) Services
    (The University of Sussex, 2024-08-15) Alotaibi, Fahad; Pay, Jack
    In the age of informatics platforms such as Twitter (X) plays a crucial role for measuring public sentiment, especially in both private and public sectors. This study explores the application of machine learning, particularly deep learning, to perform sentiment analysis on tweets about Saudi Telecom Company (STC) services in Saudi Arabia. A comparative analysis was conducted between pre-trained sentiment analysis models in English and in Arabic to assess their effectiveness in classifying sentiments. In addition, the study highlights a challenge in existing Arabic models, which are based on English model architectures but trained on varied datasets, such as Modern Standard Arabic and Classical Arabic (Al-Fus’ha). These models often lack the capability to handle the diverse Arabic dialects commonly used on social media. To overcome this issue, the study involved fine-tuning a pre-trained Arabic model using a dataset of tweets related to STC services, specifically focusing on the Saudi dialect. Data was collected from Twitter (X), focusing on mentions of the Saudi Telecom Company (STC). Both English and Arabic models were applied to this data, and their performance in sentiment analysis was evaluated. The fine-tuned Arabic model (CAMeL-BERT) demonstrated improved accuracy and a better understanding of local dialects compared to its initial version. The results highlight the importance of model adaptation for specific languages and contexts and underline the potential of CAMeL-BERT in sentiment analysis for Arabic-language content. The findings offer practical implications for enhancing customer service and engagement through more accurate sentiment analysis of social media content in the service providers sector.
    15 0
  • Thumbnail Image
    ItemRestricted
    USER MODELLING AND ADAPTIVE INTERACTION ON INTERACTIVE DASHBOARDS
    (University of Manchester, 2024-06-06) Alhamadi, Mohammed; Vigo, Markel
    Interactive information dashboards are data visualisation tools that enable interaction with complex underlying datasets using visualisations such as charts and maps typically on a single display. The popularity of dashboards has grown across key sectors such as healthcare, education and energy, driven by the abundance of available data. Still, users face various challenges when interacting with dashboards, ranging from insufficient support for essential functionalities such as data-detail adjustment to problems with data presentation such as information overload. These problems subject users to high cognitive demands, complicate information retrieval and increase the risk of arriving at incorrect conclusions, ultimately leading to erroneous decision-making. Dashboard issues are sometimes due to developers prioritising aesthetics over functionality. At other times, they arise from a mismatch between users' visual literacy level expected by dashboard developers and the actual level of the users. When dashboard users encounter interaction problems, they exhibit certain interaction strategies as workarounds to overcome the problems. Modelling user behaviour on dashboards can shed light on these workarounds especially when applied in problematic situations. Strategies employed by users in response to interaction problems have, to a large extent, not been thoroughly explored. This thesis addresses this gap by identifying the interaction and information presentation problems faced by dashboard users, adaptation techniques that could address these problems and user strategies applied in response to problems. Results of a literature review and an interview study highlighted various problems faced by users, and at times, a disconnect between problems, adaptations and strategies. Subsequently, an experiment was conducted to identify user strategies indicative of problems when encountering four established interaction and information presentation problems: information overload, inappropriate data order \& grouping, ineffective data presentation and misaligned visual literacy expectations. These problems were prioritised based on their severity and the limited understanding of user strategies when encountering them. We found clear distinctions between the strategies applied on problematic and adapted dashboards. Then, we incorporated the strategies, along with graph literacy, in user models to predict usability. In a final user study, we ecologically validated the effect of the majority of the influential user strategies on usability in real-world dashboards. While filtering data was linked to negative outcomes, customisation made users more effective. Encouragingly, usability predictions were more accurate on problematic dashboards and challenging tasks. These promising results open up avenues for tailored interventions to address the problems in real time.
    23 0
  • Thumbnail Image
    ItemRestricted
    An Exploration of Word Embedding Models for Phishing Email Detection
    (University of Southampton, 2023-09-21) Alghamdi, Rawan; Hewitt, Sarah
    Phishing emails are dangerous cyberattacks that attackers use to steal information. Manual solutions such as blacklists can be used to detect phishing emails. However, The emergence of machine learning solutions has made phishing email detection faster and easier. This study explored and compared the performance of three deep learning models for detecting text-based phishing emails. The models used different word embedding techniques: Word2Vec, FastText, and GloVe. All three models used a Long Short-Term Memory (LSTM) classifier. Two publicly available datasets were merged to create a balanced dataset of phishing and legitimate emails using only the body text of the emails, excluding the header. The first dataset is the Fraudulent E-mail Corpus - Nigerian Letter or ”419” Fraud, which contains phishing emails. The second dataset is the Enron Email Dataset, which contains legitimate emails. The Word2Vec- LSTM model achieved the best performance, with an F1 score of 98.62% and an accuracy of 98.62%. The FastText-LSTM also performed well, but its performance was slightly lower than the Word2Vec-LSTM model, with an F1 score of 95.73% and an accuracy of 95.73%. The GloVe-LSTM model performed poorly, with an F1 score of 55.79% and an accuracy of 60.53%. We therefore conclude that using different embedding techniques with the same classifier can result in different performances for detecting and classifying phishing and legitimate emails.
    46 0

Copyright owned by the Saudi Digital Library (SDL) © 2025