Exploring Nonlinear Associations and Interactions of Risk Factors for Breast Cancer Incidence Using Machine Learning Approaches
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Imperial College London
Abstract
BACKGROUND: Breast cancer is influenced by a complex array of risk factors. This study aimed
to identify nonlinear associations and interactions between various risk factors and breast cancer
incidence using computationally efficient, interpretable methods.
METHODS: Data from the Generations Study, a long-term prospective cohort of 104,423 women,
were analysed. Risk factors evaluated included demographic, medical, reproductive, hormonal, and
lifestyle variables. We compared the performance of traditional Cox proportional hazards models
with tree-based methods, including Classification and Regression Trees (CART) and random forests,
using the C-statistic. SHapley Additive exPlanations (SHAP) values were extracted to interpret
random forest outputs, highlighting key risk factors and interactions. Stability selection was applied
to enhance computational efficiency and identify the most stable and important variables.
RESULTS: The multivariable Cox model achieved the highest predictive accuracy with C-index of
0.657, slightly outperforming the random forest model (C-index of 0.650). However, the random
forest model revealed nonlinear associations and interactions not captured by the Cox model. Age,
family history of breast cancer, and benign breast disease were among the most critical factors
identified, with complex interactions noted between age, body mass index at entry, and family history
with other risk factors such as hormone replacement therapy duration, oral contraceptive duration,
and smoking pack-years. Stability selection effectively reduced the number of variables without
compromising model performance.
CONCLUSIONS: While linear models capture dominant associations, tree-based models like
random forests offer additional insights into complex, nonlinear relationships among breast cancer
risk factors, highlighting the potential for more personalised screening and prevention strategies.
Description
تم ارفاق ما يثبت منح الدرجة الرجاء قبول الطلب
Keywords
breast cancer risk, machine learning, nonlinear associations, risk factor interactions, random survival forest, stability selection, cox proportional hazards, decision trees, personalised screening, feature selection