SACM - United Kingdom

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    ItemRestricted
    MULTI-TARGET REGRESSION APPLICATIONS FOR PREDICTING GENE EXPRESSION LEVELS
    (Saudi Digital Library, 2023-12-12) Altaweraqi, Nada; King, Ross
    The progress of cancer is subject to activities of cellular networks, all of which are governed by the dynamics of various factors, both inside and outside the cell. Although the mechanisms of these networks remain enigmatic, they can be explored by studying gene expression levels. However, these are challenging to model and predict. Predictions of gene expression levels can be based on two approaches: firstly, mechanistic models, which simulate some aspects of biological systems, and secondly, machine learning models, built using empirical data. Both approaches are widely deployed, with limitations experienced on both sides. This thesis outlines a novel framework for integrating models representing mechanism knowledge of signalling pathways in machine learning models. The latter models are multi-target regression models that predict gene expression levels. The study proposes multiple representations of signalling pathways transformed into features describingdifferent genes. The first representation is the graph-based representation which encodes interaction knowledge using graph heuristics and embedding methods. Applying multi-target regression staking aided by the common neighbour features resulted in a noticeable improvement in predictions, as the significant test resulted in a p- value of 4.4 e-244, which is a strong evidence that there is a clear improvement in predictions from the proposed model. Our frameworks achieved better performance than the baseline after changing the graph-based algorithm , with clear superiority for Deepwalk-based models. Deepwalk-based models outperformed the baseline in 208 of 300 genes. Furthermore, when compared using the significant test, all methods that integrate pathway knowledge significantly outperformed the baseline. We also investigated the utility of the machine learning models to develop sound hypotheses of gene associations. It was noticed that some of the knowledge retrieved from these models are reported in the literature. The second representation is a stochastic simulation model of signalling pathways, which reflects the activities of signalling pathways over time. As hypothesised, this model was found to surpass both the baseline and the Deepwalk-based model built using graph modelling techniques. Thee model built using this representation outperformed the baseline in 123 genes out 200 (p-value of 0.01 ). Finally, we present a surrogate modelling approach to reducing the impact of noise in gene expression data. The surrogate data is tested using association measures and proved to yield more accurate results than raw data.
    21 0

Copyright owned by the Saudi Digital Library (SDL) © 2024