SACM - United Kingdom
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667
Browse
26 results
Search Results
Item Restricted Exploring Nonlinear Associations and Interactions of Risk Factors for Breast Cancer Incidence Using Machine Learning Approaches(Imperial College London, 2024-08) Alqarni, Lina; Heath Alicia; Berrington, AmyBACKGROUND: Breast cancer is influenced by a complex array of risk factors. This study aimed to identify nonlinear associations and interactions between various risk factors and breast cancer incidence using computationally efficient, interpretable methods. METHODS: Data from the Generations Study, a long-term prospective cohort of 104,423 women, were analysed. Risk factors evaluated included demographic, medical, reproductive, hormonal, and lifestyle variables. We compared the performance of traditional Cox proportional hazards models with tree-based methods, including Classification and Regression Trees (CART) and random forests, using the C-statistic. SHapley Additive exPlanations (SHAP) values were extracted to interpret random forest outputs, highlighting key risk factors and interactions. Stability selection was applied to enhance computational efficiency and identify the most stable and important variables. RESULTS: The multivariable Cox model achieved the highest predictive accuracy with C-index of 0.657, slightly outperforming the random forest model (C-index of 0.650). However, the random forest model revealed nonlinear associations and interactions not captured by the Cox model. Age, family history of breast cancer, and benign breast disease were among the most critical factors identified, with complex interactions noted between age, body mass index at entry, and family history with other risk factors such as hormone replacement therapy duration, oral contraceptive duration, and smoking pack-years. Stability selection effectively reduced the number of variables without compromising model performance. CONCLUSIONS: While linear models capture dominant associations, tree-based models like random forests offer additional insights into complex, nonlinear relationships among breast cancer risk factors, highlighting the potential for more personalised screening and prevention strategies13 0Item Restricted Exploring Nonlinear Associations and Interactions of Risk Factors for Breast Cancer Incidence Using Machine Learning Approaches(Imperial College London, 2024) Alqarni, Lina; Heath, AliciaBACKGROUND: Breast cancer is influenced by a complex array of risk factors. This study aimed to identify nonlinear associations and interactions between various risk factors and breast cancer incidence using computationally efficient, interpretable methods. METHODS: Data from the Generations Study, a long-term prospective cohort of 104,423 women, were analysed. Risk factors evaluated included demographic, medical, reproductive, hormonal, and lifestyle variables. We compared the performance of traditional Cox proportional hazards models with tree-based methods, including Classification and Regression Trees (CART) and random forests, using the C-statistic. SHapley Additive exPlanations (SHAP) values were extracted to interpret random forest outputs, highlighting key risk factors and interactions. Stability selection was applied to enhance computational efficiency and identify the most stable and important variables. RESULTS: The multivariable Cox model achieved the highest predictive accuracy with C-index of 0.657, slightly outperforming the random forest model (C-index of 0.650). However, the random forest model revealed nonlinear associations and interactions not captured by the Cox model. Age, family history of breast cancer, and benign breast disease were among the most critical factors identified, with complex interactions noted between age, body mass index at entry, and family history with other risk factors such as hormone replacement therapy duration, oral contraceptive duration, and smoking pack-years. Stability selection effectively reduced the number of variables without compromising model performance. CONCLUSIONS: While linear models capture dominant associations, tree-based models like random forests offer additional insights into complex, nonlinear relationships among breast cancer risk factors, highlighting the potential for more personalised screening and prevention strategies.9 0Item Restricted Intelligent Diabetes Screening with Advanced Analytics(University of Birmingham, 2024) Aldossary, Soha; Smith, PhillipDiabetes mellitus is a prevalent chronic disease with significant health implications worldwide. This project aimed to mitigate this pressing public health concern by using machine learning techniques and deep learning algorithms. I also established an online platform at which patients can enter their test results and health information and receive real-time diabetes detection and dietary recommendations based on their health profiles. Research has illustrated that models such as Gradient Boosting, Random Forest and Decision Trees perform well in diabetes prediction due to their ability to capture complex nonlinear relationships and handle diverse input features. Therefore, this project incorporated these models with others, such as the Support Vector Classifier and AdaBoost. Additionally, deep learning models, including Neural Networks, were utilised to explore intricate relationships within diabetes-related indicators. Notably, the Gradient Boosting model achieved an impressive accuracy of 99%, with 99% precision, 97% recall and 97% F1-score. To implement these solutions, I used Python as the programming language, employing libraries such as scikit-learn, NumPy, Pandas and Matplotlib, while Streamlit served as the app’s framework.18 0Item Restricted Developing a medical robot for MR guided cardiac catheterization(university college london, 2024) Almutairi, Abdullah; Muthurangu, VivekCardiac catheterization involves the insertion of a needle into the veins, enabling physicians to obtain images of the heart without invasive surgery. This procedure, therefore, plays a key role in the diagnosis and treatment of various heart diseases. In recent years, there has been widespread adoption of robotics in surgical procedures, whereby some of the benefits include efficiency, a faster operational speed, and a high rate of action reproducibility. The primary objective of this study was to evaluate the application of behavioural cloning in training robotic systems to perform robotic magnetic resonance–guided catheterization on 3D-printed heart models. Six 3D heart models were printed, and the time taken to perform the catheterization process was measured. The data collection process consisted of manual catheterization, catheterization using a joystick, and simulations of both processes. The results indicated that the manual catheterization process was faster than the robotic one. Nevertheless, the success of the robotic-assisted simulation indicates that it is possible to use behavioural cloning to train the robotic systems to perform catheterization. This study demonstrates that behavioural cloning can be effectively adopted in the catheterization process, whereby learning models can be developed for conducting catheterization procedures.18 0Item Restricted Exploring Ridgeless Regression in High-Dimensional Data: A Numerical Investigation into Predictive Accuracy(University of Nottingham, 2024) Alderaan, Saad; Preston, SimonThe rise of high-dimensional datasets, where the number of predictors p exceeds the number of observations n, comes with significant challenges for linear models with the Ordinary Least Squares (OLS) method. This report investigates the application of ridgeless regression, an OLS method with a minimum-norm solution, in such high-dimensional settings, particularly when p ≫ n. The minimum-norm OLS is compared against ridge regression in terms of predictive accuracy in high-dimensional settings. Using simulation studies on the spiked covariance model, this report shows that the minimum-norm OLS can outperform ridge regression under certain high-dimensional datasets where p ≫ n, contradicting the traditional assumptions that regularization techniques are necessary in high-dimensional settings. Moreover, this report shows that the optimal regularization parameter λ in ridge regression can be negative in such cases, challenging the conventional belief that the regularization parameter λ is always positive. This is due to the inherent structure of the data, which may provide sufficient implicit regularization, making additional penalization unnecessary or even counterproductive. The implications of these findings extend to practical applications in fields such as genomics and finance, where high-dimensional data is common. The conclusions drawn from this work highlight the potential of ridgeless regression as a viable alternative to ridge regression in high-dimensional data, especially when traditional methods encounter issues like overfitting. The report contributes to the ongoing discussion in statistical machine learning by providing new insights into when and why ridgeless regression may be preferred.7 0Item Restricted Early Prediction of Cancer Using Supervised Machine Learning: A Study of Electronic Health Records From The Ministry of National Gurad Health Affairs(University College London (UCL), 2024-08) Alfayez, Asma; Lai, Alvina; Kunz, HolgerEarly detection and treatment of cancer can save lives; however, identifying those most at risk of developing cancer remains challenging. Electronic health records (EHR) provide a rich source of "big" data on large patient numbers. I hypothesised that in the period preceding a definitive cancer diagnosis, there exist healthcare events, such as a history of disease, captured within EHR data that characterise cancer progression and can be exploited to predict future cancer occurrence. Using longitudinal phenotype data from the EHR of the Ministry of National Guard Health Affairs, a large healthcare provider in Saudi Arabia, I aimed to discover health event patterns present in EHR data that predict cancer development in periods prior to diagnosis by developing predictive models using supervised machine learning (ML) algorithms. I used two different prediction periods: six months and one year prior to cancer diagnosis. Initially, the thesis focused on the prediction of both malignant and benign neoplasms, before moving on to predicting the future risk of malignant neoplasms (cancer), since predicting life-threatening illness remains the most important clinical challenge. To refine the approach for specific cancer types, predictive models were built for the top three malignancies in this population: breast, colon, and thyroid cancers. ML predictive models were developed using the following algorithms: (1) logistic regression; (2) penalised logistic regression; (3) decision trees; (4) random forests; (5) gradient boosting; (6) extreme gradient boosting; (7) k-nearest neighbours; and (8) support vector machine. Model performance was assessed using k-fold cross-validation and area under the curve—receiver operating characteristics (AUC-ROC). After developing different models, their performance was compared with and without hyperparameter tuning using tree-based pipeline optimization (TPOT) and GridSearch. This study provides novel proof-of-principle that ML algorithms can be applied to EHR data to develop models that can be used to predict future cancer occurrence.29 0Item Restricted Evaluating CAMeL-BERT for Sentiment Analysis of Customer Satisfaction with STC (Saudi Telecom Company) Services(The University of Sussex, 2024-08-15) Alotaibi, Fahad; Pay, JackIn the age of informatics platforms such as Twitter (X) plays a crucial role for measuring public sentiment, especially in both private and public sectors. This study explores the application of machine learning, particularly deep learning, to perform sentiment analysis on tweets about Saudi Telecom Company (STC) services in Saudi Arabia. A comparative analysis was conducted between pre-trained sentiment analysis models in English and in Arabic to assess their effectiveness in classifying sentiments. In addition, the study highlights a challenge in existing Arabic models, which are based on English model architectures but trained on varied datasets, such as Modern Standard Arabic and Classical Arabic (Al-Fus’ha). These models often lack the capability to handle the diverse Arabic dialects commonly used on social media. To overcome this issue, the study involved fine-tuning a pre-trained Arabic model using a dataset of tweets related to STC services, specifically focusing on the Saudi dialect. Data was collected from Twitter (X), focusing on mentions of the Saudi Telecom Company (STC). Both English and Arabic models were applied to this data, and their performance in sentiment analysis was evaluated. The fine-tuned Arabic model (CAMeL-BERT) demonstrated improved accuracy and a better understanding of local dialects compared to its initial version. The results highlight the importance of model adaptation for specific languages and contexts and underline the potential of CAMeL-BERT in sentiment analysis for Arabic-language content. The findings offer practical implications for enhancing customer service and engagement through more accurate sentiment analysis of social media content in the service providers sector.16 0Item Restricted Integrating Sentiment and Technical Analysis with Machine Learning for Improved Stock Market Predictions(University of Dundee, 2024-07-30) Almubarak, Maha Sofyan A; Mazibas, Murat; Kwiatkowski, AndrzejThis thesis advances stock forecasting by integrating sentiment analysis from Twitter as social media platform with traditional technical indicators, employing machine learning (ML) techniques. The research identifies gaps in existing literature, particularly in the use of appropriate validation methods and the balance of statistical metrics with financial benchmarks. It proposes a comprehensive methodology that incorporates Time Series Cross- Validation and hyperparameter tuning to enhance the adaptability and economic robustness of forecasting models. The empirical analysis unfolds in three chapters: 1. Technical Analysis within LSTM models to predict movements of the SPY ETF, validated through Time Series Cross-Validation to ensure robustness, focusing on both accuracy and financial performance. 2. Integration of Sentiment Analysis to assess its impact on model responsiveness and financial outcomes, demonstrating improved predictive accuracy. 3. Application to a Diverse Stock Portfolio, where models are applied to 10 different stocks across various sectors, confirming the models’ effectiveness and practical utility in real-world trading strategies. Key findings suggest that incorporating sentiment analysis significantly enhances the predictive precision of models, particularly in volatile market conditions. This synergy between technical indicators and sentiment data not only boosts accuracy but also enriches the models’ economic performance, offering valuable insights for traders and academic researchers exploring complex financial markets.14 0Item Restricted The Potential of Radiomic Analysis for Enhancing the Diagnostic Ability of PET and CMR in Cardiac Sarcoidosis(University of Leeds, 2024) Mushari, Nouf; Tsoumpas, CharalamposCardiac sarcoidosis (CS) is a granulomatous inflammatory disease whose aetiology is unknown, which features the existence of non-caseating granulomas. This thesis addresses the challenge of accurately diagnosing CS by enhancing the diagnostic capabilities of [18F]fluorodeoxyglucose positron emission tomography ([18F]FDG PET) and late gadolinium-enhanced cardiac magnetic resonance imaging (LGE-CMR). Independently, these modalities face limitations in isolating CS with high specificity and sensitivity. The thesis aimed to improve the diagnostic efficiency by integrating [18F]FDG PET and LGE-CMR through advanced radiomic feature analysis. Radiomic analysis was conducted across various scenarios, encompassing comparisons between positive and negative CS groups, distinguishing between active and inactive disease states, and differentiating CS patients from those experiencing myocardial inflammation due to another cause (post-COVID-19 patients). The thesis concludes that radiomic analysis can enhance the objectivity and complementarity of PET and CMR in identifying cardiac sarcoidosis. While PET- based analyses demonstrate high performance, the project underscores the essential role of CMR-based analysis in mitigating challenges associated with PET image preparation variability.17 0Item Restricted An Exploration of Word Embedding Models for Phishing Email Detection(University of Southampton, 2023-09-21) Alghamdi, Rawan; Hewitt, SarahPhishing emails are dangerous cyberattacks that attackers use to steal information. Manual solutions such as blacklists can be used to detect phishing emails. However, The emergence of machine learning solutions has made phishing email detection faster and easier. This study explored and compared the performance of three deep learning models for detecting text-based phishing emails. The models used different word embedding techniques: Word2Vec, FastText, and GloVe. All three models used a Long Short-Term Memory (LSTM) classifier. Two publicly available datasets were merged to create a balanced dataset of phishing and legitimate emails using only the body text of the emails, excluding the header. The first dataset is the Fraudulent E-mail Corpus - Nigerian Letter or ”419” Fraud, which contains phishing emails. The second dataset is the Enron Email Dataset, which contains legitimate emails. The Word2Vec- LSTM model achieved the best performance, with an F1 score of 98.62% and an accuracy of 98.62%. The FastText-LSTM also performed well, but its performance was slightly lower than the Word2Vec-LSTM model, with an F1 score of 95.73% and an accuracy of 95.73%. The GloVe-LSTM model performed poorly, with an F1 score of 55.79% and an accuracy of 60.53%. We therefore conclude that using different embedding techniques with the same classifier can result in different performances for detecting and classifying phishing and legitimate emails.82 0
- «
- 1 (current)
- 2
- 3
- »