Computational Approaches to Drug Repurposing Through Probabilistic Functional Integration of Disease-Gene Networks and Graph Neural Networks
dc.contributor.advisor | Wipat, Anil | |
dc.contributor.author | Alsobhe, Aoesha Gaed | |
dc.date.accessioned | 2025-06-24T09:53:12Z | |
dc.date.issued | 2025-06-11 | |
dc.description.abstract | Drug discovery is a time-consuming, costly, high-risk, and complex process. An alternative to traditional drug development is drug repurposing, which aims to find new uses for existing drugs. This approach significantly reduces time and cost, as much of the safety evaluation has already been completed. Computational approaches to drug repurposing help generate hypotheses about potential drug-disease indications, which can later be validated experimentally in the lab. Network integration is a common computational technique in drug repurposing applications. These approaches combine multiple diverse data sources into a single heterogeneous biomedical integrated network. Such networks combine various types of biological data, including drugs, diseases, genes, and proteins, into a unified framework where biomedical entities are represented as nodes and their interactions as edges. Integrating diverse data sources is essential to gain a comprehensive picture of interconnected biological entities, which can then be mined to infer new hypotheses about drug repurposing opportunities. The quality of these integrated networks is highly dependent on the experimental data they include. However, biomedical data is often noisy and incomplete, leading to a high rate of false results in existing networks. Therefore, there is an important need for methods to reduce noise during network integration. One proposed technique to produce accurate integrated networks is Probabilistic Functional Integrated Networks (PFINs), which assess data quality and generate confidence scores to filter out low-quality data before mining these networks for drug repurposing opportunities. Disease-Gene Association (DGA) networks, where nodes represent diseases and genes and edges represent their associations, are the major building blocks for most biomedical integrated networks used in drug repurposing applications. Unfortunately, many available DGA networks contain a high rate of false results due to the quality of the biomedical data, which faces numerous challenges, including incorrect entries, missing values, inconsistencies, duplication, and various forms of bias. For instance, high-throughput experimental studies, which are commonly used to generate biological data, often produce incomplete and noisy data containing both false positives and false negatives. Although methods exist to score the confidence of DGAs, they are often unreliable. Many of these iv scoring approaches rely on heuristic strategies that do not assess data quality prior to integration. For example, they often overlook the impact of duplicated data, which can artificially inflate confidence scores and distort the strength of associations. To address this gap, we investigated the applicability of PFINs to DGA networks by researching and developing novel strategies to build and evaluate DGA PFINs. These accurate integrated DGA networks can be employed in various computational drug repurposing applications, including deep learning techniques. Deep learning has become the leading technique in most in silico applications for drug repurposing. Among deep learning methods, Graph Neural Networks (GNNs) have gained considerable attention due to their ability to learn complex relationships between drugs and related biological entities from heterogeneous biomedical integrated networks. Existing GNN applications in drug repurposing often overlook important aspects of data quality, such as noise and incompleteness. Given that the performance of GNNs is highly dependent on the quality of the integrated networks used for training, incorporating PFINs with GNNs could enhance their performance by reducing noise during network integration. To address these issues, we investigated the impact of incorporating the PFINs approach within GNNs on their performance. The constructed DGA PFIN was integrated with an existing network and used to train GNN models. Another factor impacting the performance of GNNs, beyond data quality, is the lack of diverse data types in the integrated networks. Most existing GNN approaches are trained on networks with a limited number of node and edge types, often ignoring node features in the training process. We explored the impact of adding various types of nodes and edges to the integrated networks on GNN performance, as well as incorporating node features in the training process. The results showed that the performance of GNN models improved by incorporating these additional types of nodes and edges into the training networks. Furthermore, the proposed GNN models demonstrated significant enhancement by incorporating node features. Finally, the proposed GNN models were employed to predict drug-disease indications, and these predictions were validated and supported by the literature. | |
dc.format.extent | 268 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14154/75653 | |
dc.language.iso | en | |
dc.publisher | Newcastle University | |
dc.subject | Computational Approaches to Drug Repurposing Through Probabilistic Functional Integration of Disease-Gene networks and Graph Neural Networks | |
dc.title | Computational Approaches to Drug Repurposing Through Probabilistic Functional Integration of Disease-Gene Networks and Graph Neural Networks | |
dc.type | Thesis | |
sdl.degree.department | School of Computing | |
sdl.degree.discipline | Interdisciplinary Computing and Complex BioSystems (ICOS) research group | |
sdl.degree.grantor | Newcastle University | |
sdl.degree.name | Doctor of Philosophy |