Graph Neural Network Architectures for Multi-Omics-Based Cancer Classification with Emphasis on Interpretability and Biomarker Discovery

Alharbi, Fadi

Graph Neural Network Architectures for Multi-Omics-Based Cancer Classification with Emphasis on Interpretability and Biomarker Discovery

dc.contributor.advisor	Vakanski, Aleksandar
dc.contributor.author	Alharbi, Fadi
dc.date.accessioned	2025-09-07T04:41:22Z
dc.date.issued	2025
dc.description	This work conducted an extensive study of Graph Neural Networks (GNNs) in cancer classification using integrated multi-omics data by focusing on interpretability enhancement as well as biomarker detection. We used multi-omics data that include gene expression combined with miRNA expression and DNA methylation profiles to improve classification output and gain more in-depth cancer mechanism understanding. The integration of these multi-omics data enabled us to develop a complete molecular description of cancer across 31 tumor types. We proposed and evaluated several graph-based architectures starting with GCN, GAT, GTN and continuing with our developed models LASSO-MOGAT and GKAN. LASSOMOGAT applies LASSO-based feature selection to the multi-omics data before implementing graph attention networks. The application of LASSO regression before graph construction delivers high performance together with interpretability and sparsity benefits to the model. On the other hand, GKAN is a novel graph-based architecture designed to enhance model performance and interpretability by using a learnable activation function. Our proposed graphbased methods utilize a combination of multi-omics data that are preprocessed using differential expression analysis where the statistically significant and biologically relevant features are selected. Also LASSO regression is used to reduce the dimensionality of the preprocessed multi-omics data. Our proposed approach achieved performance that surpassed the existing literature on multiple cancer types. We found that GKAN along with similar models demonstrated the potential to discover important biological signatures through functional GO and KEGG pathway validations. While our work offers significant contributions, it opens several avenues for future work. • Expanding the dataset beyond what TCGA provides by incorporating broader and larger patient groups to evaluate model generalization. The analysis can benefit from other dataset including those found in GEO and other databases which will enhance clinical relevance and application scope. • The performance can be enhanced by adding other omics types such as proteomics, metabolomics, and copy number variation which can provide an enhanced representation of cellular states. • In our study we utilized correlation and PPI based graph architecture, discovering causal relationship within the multi-omics data can enable the construction of a graph that enhance model performance and interpretability. • Clinical collaborations with experts in the medical field and biologists aid the translation of predicted outcomes into medical testable hypotheses which lead to laboratory confirmation of newly recognized disease biomarkers and potential therapeutic options.
dc.description.abstract	Cancer describes a class of diseases in which malignant cells form inside the human body due to genetic change. These cells divide indiscriminately upon development, extend throughout the organs, and in many cases, they can result in loss of life. Cancer is the second leading cause of mortality globally after cardiovascular illnesses. Recent studies on integrating multiple omics data highlighted the potential to advance our understanding of the cancer disease process. Graph neural networks (GNNs) have emerged as powerful computational models for cancer classification tasks, particularly when applied to high-dimensional and heterogeneous multi-omics datasets. GNNs differ from classic neural models MLPs, CNNs, RNNs through their capability to handle complex biological network relationships by mapping biological entities as graph nodes which they analyze using network structure information. They perfectly suit PPI networks or gene regulatory networks because they can effectively capture the natural biological interactions present in these networks. GNNs address key challenges in multi-omics data analysis, including data sparsity and complexity, by learning node embeddings that integrate both omics features and topological information. Attention-based GNNs have advanced both model interpretability and predictive accuracy which leads to more precise biomarker and cancer type classification. These advantages make GNNs as effective approaches to optimizing precision oncology especially when they use integrated omics data as input features. Graph Attention Networks (GATs) improve attention-based GNNs by implementing dynamic weights for neighboring nodes which depend on their relevance to the model learning process. The selective attention mechanism proves highly effective when analyzing multiomics data because different biological relationships have varying degrees of informative value. Building upon the strengths of GATs in emphasizing important interactions, the Graph Kolmogorov–Arnold Network (GKAN) introduces new interpretability through its combination of Kolmogorov–Arnold representation theorem with graph structures. The univariate functions of GKAN provide effective non-linear modeling capacity for multi-omics data structures which maintain their network connections. Our work introduces three key innovations: (1) LASSO-MOGAT, a novel Graph Attention Network that integrates LASSO-based feature selection with multi-omics graph learning, demonstrating superior performance in classifying 31 cancer types; (2) An interpretable Graph Kolmogorov–Arnold Network (GKAN) that identifies pan-omics biomarker signatures through learnable activation functions; and (3) A systematic comparison of graph construction methods, proving that multi-omics correlation networks outperform single-omics approaches.
dc.format.extent	141
dc.identifier.uri	https://hdl.handle.net/20.500.14154/76339
dc.language.iso	en_US
dc.publisher	Saudi Digital Library
dc.subject	Multi-Cancer Classification
dc.subject	Multi-mics Integration
dc.subject	Graph Neural Networks (GNNs)
dc.subject	Graph Attention Networks (GATs)
dc.subject	Graph Kolmogorov-Arnold Network (GKAN)
dc.subject	Protein Protein Interaction (PPI)
dc.title	Graph Neural Network Architectures for Multi-Omics-Based Cancer Classification with Emphasis on Interpretability and Biomarker Discovery
dc.type	Thesis
sdl.degree.department	Computer Science
sdl.degree.discipline	Artificial Intelligence
sdl.degree.grantor	University of Idaho
sdl.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SACM-Dissertation.pdf
Size:: 10.46 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

SACM - United States of America