Predicting Biochemical Recurrence of Prostate Cancer Using Genetic Sequence Data and Clinical Variables Using High Dimensional Multivariate Models

Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Abstract Prostate cancer is a major cause of death in men worldwide. Although well-established predictors of disease progression and death have been identified, it remains a challenge to predict outcomes accurately due to disease heterogeneity. Biochemical recurrence (BCR) is an early surrogate endpoint and identifying key predictors of BCR may lead to a better and earlier treatment of patients who are at high risk. The overall aim of this study was to identify clinical and genetic predictors for BCR on prostate cancer patients after radical prostatectomy using statistical techniques, and to develop signatures to predict time to BCR event. This study used prostate cancer data (n=495 subjects) from The Cancer Genome Atlas (TCGA) and patient’s demographic and clinical characteristics, and high-dimensional gene expression variables were used for analyses. Principal component analysis (PCA) was applied to the high-dimensional gene expression (n=57,251 genes) to understand the overall pattern in post-treated patients. Shrinkage approaches (Lasso and Elastic-net) were also proposed to predict BCR using gene expressions. However, prior to applying the prediction methods in the real dataset, their performances were assessed using simulated data. Overall, both techniques performed similarly. However, the ratio between the event and control was significantly imbalanced (19.2% events vs. 80.8% control) which had a negative impact on the sensitivity/specificity but not on the overall classification accuracy of the methods. These methods were also applied to predict BCR (as binary outcome) using TCGA gene expression, and both techniques performed similarly although the group of genes selected by Lasso was a subset of the genes selected by Elastic-net. Furthermore, a Cox survival model (using shrinkage approach) was applied to predict time to BCR event on most significant genes (n=743 genes), and further stepwise variable selection was implemented. Eight genes were selected, and the corresponding genetic signature scores (with and without clinical characteristics) were generated. The novel score had a strong positive association with high risk of a BCR event. Finally, the 31 Cell-Cycle Progression (CCP) genes scores (with and without clinical characteristics) were validated in TCGA and showed strong positive associations with high risk of BCR event. The newly iv developed score in this thesis has several advantages over the most popular CCP score in terms of identifying the high-risk patients earlier for better management and shorter processing time and reduced expense.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025