Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 6 of 6

Restricted
Understanding and Predicting the Behavioural Evolution of Promotional Spambots on Social Media
(University of Birmingham, 2025-05-15) Alzahrani, Ohoud; Hendleym Bob
Social media bots are rapidly evolving, rendering traditional detection tools increasingly ineffective as these bots adapt their strategies. This research introduces a dynamic and predictive framework for modelling the behavioural evolution of online promotional spambots. Inspired by biological DNA, bot activities are encoded into behavioural sequences, with each block capturing seven distinct post-level features. Techniques such as sequence alignment, cosine similarity, and hierarchical clustering are used to group bots into behaviourally similar “families.” These families serve as the foundation for identifying behavioural mutations—insertions, deletions, substitutions, and alterations—that signal adaptive strategy changes. The model evaluates how these mutations propagate within and across bot families and investigates their predictive power through mutation transfer analysis and an event-driven case study. Results show that bots within the same family are significantly more likely to share and adopt behavioural mutations than those from different families. Closely related bots achieved high precision and F1 scores (up to 0.97) in mutation transfer prediction. These findings support the feasibility of a behavioural evolution model as a scalable, interpretable, and adaptive tool for anticipating future bot activity and offering a proactive approach to combating evolving threats on social media platforms.
16 0
Restricted
Reducing Type 1 Childhood Diabetes in Saudi Arabia by Identifying and Modelling Its Key Performance Indicators
(Royal Melbourne Institute of Technology, 2024-06) Alazwari, Ahood; Johnstone, Alice; Abdollahain, Mali; Tafakori, Laleh
The increasing incidence of type 1 diabetes (T1D) in children is a growing global health concern. Reducing the incidence of diabetes generally is one of the goals in the World Health Organisation’s (WHO) 2030 Agenda for Sustainable Development Goals. With an incidence rate of 31.4 cases per 100,000 children and an estimated 3,800 new cases per year, Saudi Arabia is ranked 8th in the world for number of T1D cases and 5th for incidence rate. Despite the remarkable increase in the incidence of childhood T1D in Saudi Arabia, there is a lack of meticulously carried out research on T1D in children when compared with developed countries. In addition, it is crucial to recognise the critical gaps in current understanding of diabetes in children, adolescents, and young adults, with recent research indicates significant global and sub-national variations in disease incidence. Better knowledge of the development of T1D in children and its associated factors would aid medical practitioners in developing intervention plans to prevent complications and address the incidence of T1D. This study employed statistical, machine learning and classification approaches to analyse and model different aspects of childhood T1D using local case and control data. In this study, secondary data from 1,142 individual medical records (359-377 cases and 765 controls) collected from three cities located in different regions of Saudi Arabia have been used in the analysis to represent the country’s diverse population. Case and control data matched by birth year, gender and location were used to control confounders and create a more robust and clinically relevant model. It is well documented that genetic and environmental factors contribute to childhood T1D so a wide range of potential key performance indicators (KPIs) from the literature were included in this study. The collected data included information on socioeconomic status, potential genetic and environmental factors, and demographic data such as city of residence, gender and birth year. Several techniques, such as cross-validation, hyperparameter tuning and bootstrapping, were used in this study to develop models. Common statistical metrics (coefficient of determination, R-squared, root mean squared error, mean absolute error) were used to evaluate performance for the regression models while for the classification models accuracy, sensitivity, precision, F score and area under the curve were utilised as performance measures. Multiple linear regression (MLR), artificial neural network (ANN) and random forest (RF) models were developed to predict the age at onset of T1D for all children 0-14 years old, as well as for the most common age group for onset, the 5-9 year olds. To improve the performance of the MLR models, interactions between variables were considered. Additionally, risk factors associated with the age at onset of T1D were identified. The results showed that MLR and RF outperformed ANN. The logarithm of age at onset was the most suitable dependent variable. RF outperformed the others for the 5-9 years age group. Birth weight, current weight and current height influenced the age at onset in both age groups. However, preterm birth was significant only in the 0-14 years cohort, while consanguineous parents and gender were significant in the 5-9 age group. Logistic regression (LR), random forest (RF), support vector machine (SVM), Naive Bayes (NB) and artificial neural network (ANN) models were utilised with case and control data to model the development of childhood T1D and to identify its key performance indicators. Full and reduced models were developed to determine the best model. The reduced models were built using the significant factors identified by the individual full model. The study found that full LR had the highest accuracy. Full RF and SVM with a linear kernel also performed well. Significant risk factors identified as being associated with developing childhood T1D include early exposure to cow’s milk, high birth weight, positive family history of T1D and maternal age over 25 years. Poisson regression (PR), RF, SVM and K-nearest neighbor (KNN) were then used to model the incidence of childhood T1D, taking in the identified significant risk factors. The interactions between variables were also considered to enhance the performance of the models. Both full and reduced models were created and compared to find the best models with the minimum number of variables. The full Poisson regression and machine learning models outperformed all other models, but reduced models with a combination of only two out of three independent variables (early exposure to cow’s milk, high birth weight and maternal age over 25 years) also performed relatively well. This study also deployed optimisation procedures with the reduced incidence models to develop upper and lower yearly profile limits for childhood T1D incidence to achieve the United Nations (UN) and Saudi recommended levels of 264 and 339 cases by 2030. The profile limits for childhood T1D then allowed us to model optimal yearly values for the number of children weighing more than 3.5kg at birth, the number of deliveries by older mothers and the number of children introduced early to cow’s milk. The results presented in this thesis will guide healthcare providers to collect data to monitor the most influential KPIs. This would enable the initiation of suitable intervention strategies to reduce the disease burden and potentially slow the incidence rate of childhood T1D in Saudi Arabia. The research outcomes lead to recommendations to establish early intervention strategies, such as educational campaigns and healthy lifestyle programs for mothers along with child health mentoring during and after pregnancy to reduce the incidence of childhood T1D. This thesis has contributed to new knowledge on childhood T1D in Saudi Arabia by: * developing a predictive model for age at onset of childhood T1D using statistical and machine learning models. * predicting the development of T1D in children using matched case-control data and identifying its KPIs using statistical and machine learning approaches. * modeling the incidence of childhood T1D using its associated significant KPIs. * developing three optimal profile limits for monitoring the yearly incidence of childhood T1D and its associated significant KPIs. * providing a list of recommendations to establish early intervention strategies to reduce the incidence of childhood T1D.
35 0
Restricted
Risk and Uncertainty in Cryptocurrency Markets
(University of East Anglia, 2024-04-23) Alsamaani, Abdulrahman; Kourtis, Apostolos; Markellos, Raphael
This dissertation consists of three kinds of research. Each one has its purpose and aim to achieve. The first research tries to discover the most effective approach for forecasting the volatility of cryptocurrency returns utilising high-frequency data that can predict the volatility of dominant and less notable cryptocurrencies. The GARCH, IGARCH, EGARCH, GJR-GARCH, HAR, and LRE models were investigated, and univariate and comprehensive regression were used. Regarding univariate regression results, the HAR model beat the other models when forecasting one day ahead, while the EGARCH model outperformed the other models when forecasting seven and thirty days ahead. In addition, the HAR + EGARCH duo beat the other model couples when forecasting one, seven, and thirty days. Aside from the primary study, the out-of-sample analysis yielded conflicting results. These results will benefit investors, portfolio managers, and other financial professionals. The second study seeks to investigate the relationship between cryptocurrency returns and uncertainty indices along with assessing the impact of the Covid-19 pandemic period on both indices and cryptocurrency returns, determining which index has the most significant influence on cryptocurrency market results, and determining which indices pair has the most significant influence on cryptocurrency market returns. Ten cryptocurrency returns, as well as eight uncertainty indices, were investigated. The Quantile Regression, Multivariate-Quantile Regression, and Granger Causality tests were used. According to the Quantile Regression results, the Cryptocurrency Policy Uncertainty index and the Cryptocurrency Price Uncertainty index considerably impact cryptocurrency returns. On the other hand, the other indices have no influence on cryptocurrency returns. The Multivariate-Quantile Regression findings demonstrated that when the cryptocurrency market experiences a bull wave, the UCRY Policy Index + Central Bank Digital Currency Attention Index combination strongly impacts cryptocurrency returns. Nonetheless, when the cryptocurrency market has a bull run, the UCRY Policy Index and the Cryptocurrency Environmental Attention (ICEA) index combination considerably impact cryptocurrency gains. During the crisis, most of the overall sample findings were verified. These insights will benefit investors, portfolio managers, and policymakers. The third research strives to find the best model for forecasting the covariance matrix of cryptocurrency returns. To achieve this purpose, five models were thoroughly examined: BEKK, Diagonal BEKK, DCC, Asymmetric DCC, and LRE are all examples of BEKK. To assess prediction accuracy and capacity, three essential criteria were used: Euclidean distance (LE), Frobenius distance (LF), and the multivariate quasi-likelihood loss function (LQ). The LRE model outperformed the other models, predicting daily and weekly frequencies more accurately. Furthermore, the Mean Squared Error (MSE) and Mean Absolute Error (MAE) loss functions were used for validation. Except for LQ, the findings were in line with the forecasting criteria. These findings have significant implications for investors and portfolio managers aiming to enhance their risk management techniques. By utilizing the knowledge provided, they may be able to make better-informed decisions to lower portfolio risk.
42 0
Restricted
Predicting the Need for Urgent Instructor Intervention in MOOC Environments
(Durham University, 2024-03-27) Alrajhi, Laila; Cristea, Alexandra I.
In recent years, massive open online courses (MOOCs) have become universal knowledge resources and arguably one of the most exciting innovations in e-learning environments. MOOC platforms comprise numerous courses covering a wide range of subjects and domains. Thousands of learners around the world enrol on these online platforms to satisfy their learning needs (mostly) free of charge. However, the retention rates of MOOC courses (i.e., those who successfully complete a course of study) are low (around 10% on average); dropout rates tend to be very high (around 90%). The principal channel via which MOOC learners can communicate their difficulties with the learning content and ask for assistance from instructors is by posting in a dedicated MOOC forum. Importantly, in the case of learners who are suffering from burnout or stress, some of these posts require urgent intervention. Given the above, urgent instructor intervention regarding learner requests for assistance via posts made on MOOC forums has become an important topic for research among researchers. Timely intervention by MOOC instructors may mitigate dropout issues and make the difference between a learner dropping out or staying on a course. However, due to the typically extremely high learner-to-instructor ratio in MOOCs and the often-huge numbers of posts on forums, while truly urgent posts are rare, managing them can be very challenging –– if not sometimes impossible. Instructors can find it challenging to monitor all existing posts and identify which posts require immediate intervention to help learners, encourage retention, and reduce the current high dropout rates. The main objective of this research project, therefore, was thus to mine and analyse learners’ MOOC posts as a fundamental step towards understanding their need for instructor intervention. To achieve this, the researcher proposed and built comprehensive classification models to predict the need for instructor intervention. The ultimate goal is to help instructors by guiding them to posts, topics, and learners that require immediate interventions. Given the above research aim the researcher conducted different experiments to fill the gap in literature based on different platform datasets (the FutureLearn platform and the Stanford MOOCPosts dataset) in terms of the former, three MOOC corpora were prepared: two of them gold-standard MOOC corpora to identify urgent posts, annotated by selected experts in the field; the third is a corpus detailing learner dropout. Based in these datasets, different architectures and classification models based on traditional machine learning, and deep learning approaches were proposed. In this thesis, the task of determining the need for instructor intervention was tackled from three perspectives: (i) identifying relevant posts, (ii) identifying relevant topics, and (iii) identifying relevant learners. Posts written by learners were classified into two categories: (i) (urgent) intervention and (ii) (non-urgent) intervention. Also, learners were classified into: (i) requiring instructor intervention (at risk of dropout) and (ii) no need for instructor intervention (completer). In identifying posts, two experiments were used to contribute to this field. The first is a novel classifier based on a deep learning model that integrates novel MOOC post dimensions such as numerical data in addition to textual data; this represents a novel contribution to the literature as all available models at the time of writing were based on text-only. The results demonstrate that the combined, multidimensional features model proposed in this project is more effective than the text-only model. The second contribution relates to creating various simple and hybrid deep learning models by applying plug & play techniques with different types of inputs (word-based or word-character-based) and different ways of representing target input words as vector representations of a particular word. According to the experimental findings, employing Bidirectional Encoder Representations from Transformers (BERT) for word embedding rather than word2vec as the former is more effective at the intervention task than the latter across all models. Interestingly, adding word-character inputs with BERT does not improve performance as it does for word2vec. Additionally, on the task of identifying topics, this is the first time in the literature that specific language terms to identify the need for urgent intervention in MOOCs were obtained. This was achieved by analysing learner MOOC posts using latent Dirichlet allocation (LDA) and offers a visualisation tool for instructors or learners that may assist them and improve instructor intervention. In addition, this thesis contributes to the literature by creating mechanisms for identifying MOOC learners who may need instructor intervention in a new context, i.e., by using their historical online forum posts as a multi-input approach for other deep learning architectures and Transformer models. The findings demonstrate that using the Transformer model is more effective at identifying MOOC learners who require instructor intervention. Next, the thesis sought to expand its methodology to identify posts that relate to learner behaviour, which is also a novel contribution, by proposing a novel priority model to identify the urgency of intervention building based on learner histories. This model can classify learners into three groups: low risk, mid risk, and high risk. The results show that the completion rates of high-risk learners are very low, which confirms the importance of this model. Next, as MOOC data in terms of urgent posts tend to be highly unbalanced, the thesis contributes by examining various data balancing methods to spot situations in which MOOC posts urgently require instructor assistance. This included developing learner and instructor models to assist instructors to respond to urgent MOOCs posts. The results show that models with undersampling can predict the most urgent cases; 3x augmentation + undersampling usually attains the best performance. Finally, for the first time, this thesis contributes to the literature by applying text classification explainability (eXplainable Artificial Intelligence (XAI)) to an instructor intervention model, demonstrating how using a reliable predictor in combination with XAI and colour-coded visualisation could be utilised to assist instructors in deciding when posts require urgent intervention, as well as supporting annotators to create high-quality, gold-standard datasets to determine posts cases where urgent intervention is required.
13 0
Restricted
Predicting Customer Attrition in B2B SaaS Using Machine Learning Classification
(Saudi Digital Library, 2023-09-15) Alalawi, Zainab; Fiaschetti, Maurizio
Customer retention and customer loss are crucial metrics in subscription-based industries like SaaS companies. Customer discharge is a significant concern for this type of business, as clients have the flexibility to terminate the service at any time. This can lead to adverse effects on the company’s revenue stream. If SaaS businesses can accurately predict the number of customers who will cancel their subscriptions and those who will continue using their services within a specific timeframe, they can more effectively forecast their revenue, cash flow, and any future growth plan accordingly. Predicting subscription renewals and cancelations remains a challenging problem for any SaaS company. However, with the ongoing advancement in machine learning and artificial intelligence, the potential for accurately forecasting this issue has significantly improved. The study examines customer attrition and customer retention prediction in a quantitative method by utilizing several different machine learning algorithms with Python, namely Logistics regression, Naïve Baye, and random forest algorithms. Data was collected from the case company’s database and manipulated to fit the algorithms. The dataset includes the customers' business data such as spend, customer platform usage data, customer service history data, and the date of the next payment. To identify the best hyperparameters for each machine- learning algorithm, A tuning technique, in particular Grid Search, was employed. Subsequently, the algorithm models were trained and assessed using optimized hyperparameters on the fitted data. Once the models were trained, they were applied to test data to obtain the analysis results. The model’s performance was measured on the quantitative model performance metrics. including F1-Score, Area under Curve (AUC), and Accuracy.
44 0
Restricted
Using phages to Treat Urinary Tract Infections: Predicting phage susceptibility using bacterial genome and MALDI-TOF data
(Saudi Digital Library, 2023-09-07) Alghamdi, Sara; Clokie, Martha
AMR, and MDR present substantial challenges for individuals and have also become a global concern. This has resulted in these infections, gaining increasing attention. Bacteriophages have become the go-to in dealing with bacteria resistance and decreasing the number of mortalities. For this project, instruments like the bacteria genome sequence and MALDI-TOF data will be used to gain predictions of phage susceptibility and serotypes. A group of 16 phages was collected in the lab with at least one manufactured host. This project obtained 70 clinical strains from the Bristol University Hospital. Two techniques were employed in this project: spot test and plague essay. Both methods seek to measure the concentration of the bacteriophage and evaluate the virus’ effectiveness. The serotypes included in this study are ST131, ST69, ST73, and ST95. The project concluded, the gene pattern of ST131 responds weakly to most phages and all concentrations. ST73_35 was the most sensitive in 108=114, 106=87 104=51. Some strains were more sensitive than the others ST73 and ST95 this is may allow to make predictions in terms of family species or sequencing. On the other hand, ST131 was the most resistance strain and then ST69, this would make more challenging to work for phage predictor. It can be noted that JK08 performing the best with strains. On the other hand, the worst phage UP15 1×104 shows more resistance to strain. In the event that further studies with Whole Genome Sequencing and MALDI-TOF were conducted to confirm this mechanism, so that would be able to predict some genes responsible for susceptibility or resistance. The outcome of this project will demonstrate a platform of a broad collection of E. coli strains that might finds the correlation of sequence types with MALDI-TOF and WGS data so we can make predictions on host range.
34 0

Saudi Cultural Missions Theses & Dissertations

Browse

Filters

Settings

Sort By

Results per page

Search Results