Early Prediction of Cancer Using Supervised Machine Learning: A Study of Electronic Health Records From The Ministry of National Gurad Health Affairs

Alfayez, Asma

Early Prediction of Cancer Using Supervised Machine Learning: A Study of Electronic Health Records From The Ministry of National Gurad Health Affairs

dc.contributor.advisor	Lai, Alvina
dc.contributor.advisor	Kunz, Holger
dc.contributor.author	Alfayez, Asma
dc.date.accessioned	2024-10-29T16:31:41Z
dc.date.issued	2024-08
dc.description	PhD thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy form University College London (UCL)
dc.description.abstract	Early detection and treatment of cancer can save lives; however, identifying those most at risk of developing cancer remains challenging. Electronic health records (EHR) provide a rich source of "big" data on large patient numbers. I hypothesised that in the period preceding a definitive cancer diagnosis, there exist healthcare events, such as a history of disease, captured within EHR data that characterise cancer progression and can be exploited to predict future cancer occurrence. Using longitudinal phenotype data from the EHR of the Ministry of National Guard Health Affairs, a large healthcare provider in Saudi Arabia, I aimed to discover health event patterns present in EHR data that predict cancer development in periods prior to diagnosis by developing predictive models using supervised machine learning (ML) algorithms. I used two different prediction periods: six months and one year prior to cancer diagnosis. Initially, the thesis focused on the prediction of both malignant and benign neoplasms, before moving on to predicting the future risk of malignant neoplasms (cancer), since predicting life-threatening illness remains the most important clinical challenge. To refine the approach for specific cancer types, predictive models were built for the top three malignancies in this population: breast, colon, and thyroid cancers. ML predictive models were developed using the following algorithms: (1) logistic regression; (2) penalised logistic regression; (3) decision trees; (4) random forests; (5) gradient boosting; (6) extreme gradient boosting; (7) k-nearest neighbours; and (8) support vector machine. Model performance was assessed using k-fold cross-validation and area under the curve—receiver operating characteristics (AUC-ROC). After developing different models, their performance was compared with and without hyperparameter tuning using tree-based pipeline optimization (TPOT) and GridSearch. This study provides novel proof-of-principle that ML algorithms can be applied to EHR data to develop models that can be used to predict future cancer occurrence.
dc.format.extent	71
dc.identifier.uri	https://hdl.handle.net/20.500.14154/73372
dc.language.iso	en
dc.publisher	University College London (UCL)
dc.subject	artificial intelligence
dc.subject	machine learning
dc.subject	data science
dc.subject	big data
dc.subject	algorithms
dc.subject	prediction
dc.subject	data mining
dc.title	Early Prediction of Cancer Using Supervised Machine Learning: A Study of Electronic Health Records From The Ministry of National Gurad Health Affairs
dc.type	Thesis
sdl.degree.department	Institute of Health Informatics
sdl.degree.discipline	Artificial Intelligence and Big Data Science
sdl.degree.grantor	University College London (UCL)
sdl.degree.name	Doctor of Philosophy

Collections

SACM - United Kingdom

Early Prediction of Cancer Using Supervised Machine Learning: A Study of Electronic Health Records From The Ministry of National Gurad Health Affairs

Files

Collections