Exploring Nonlinear Associations and Interactions of Risk Factors for Breast Cancer Incidence Using Machine Learning Approaches

dc.contributor.advisorHeath, Alicia
dc.contributor.authorAlqarni, Lina
dc.date.accessioned2025-06-15T06:22:05Z
dc.date.issued2024
dc.descriptionتم ارفاق ما يثبت منح الدرجة الرجاء قبول الطلب
dc.description.abstractBACKGROUND: Breast cancer is influenced by a complex array of risk factors. This study aimed to identify nonlinear associations and interactions between various risk factors and breast cancer incidence using computationally efficient, interpretable methods. METHODS: Data from the Generations Study, a long-term prospective cohort of 104,423 women, were analysed. Risk factors evaluated included demographic, medical, reproductive, hormonal, and lifestyle variables. We compared the performance of traditional Cox proportional hazards models with tree-based methods, including Classification and Regression Trees (CART) and random forests, using the C-statistic. SHapley Additive exPlanations (SHAP) values were extracted to interpret random forest outputs, highlighting key risk factors and interactions. Stability selection was applied to enhance computational efficiency and identify the most stable and important variables. RESULTS: The multivariable Cox model achieved the highest predictive accuracy with C-index of 0.657, slightly outperforming the random forest model (C-index of 0.650). However, the random forest model revealed nonlinear associations and interactions not captured by the Cox model. Age, family history of breast cancer, and benign breast disease were among the most critical factors identified, with complex interactions noted between age, body mass index at entry, and family history with other risk factors such as hormone replacement therapy duration, oral contraceptive duration, and smoking pack-years. Stability selection effectively reduced the number of variables without compromising model performance. CONCLUSIONS: While linear models capture dominant associations, tree-based models like random forests offer additional insights into complex, nonlinear relationships among breast cancer risk factors, highlighting the potential for more personalised screening and prevention strategies.
dc.format.extent45
dc.identifier.urihttps://hdl.handle.net/20.500.14154/75522
dc.language.isoen
dc.publisherImperial College London
dc.subjectbreast cancer risk
dc.subjectmachine learning
dc.subjectnonlinear associations
dc.subjectrisk factor interactions
dc.subjectrandom survival forest
dc.subjectstability selection
dc.subjectcox proportional hazards
dc.subjectdecision trees
dc.subjectpersonalised screening
dc.subjectfeature selection
dc.titleExploring Nonlinear Associations and Interactions of Risk Factors for Breast Cancer Incidence Using Machine Learning Approaches
dc.typeThesis
sdl.degree.departmentSchool of public health
sdl.degree.disciplineHealth data analytics and machine learning
sdl.degree.grantorImperial College London
sdl.degree.nameMaster of science (MSc)

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025