Deep Multi-Modality Fusion for Integrative Healthcare

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Queen Mary University of London

Abstract

The healthcare industry generates vast amounts of data, driving advancements in patient diagnosis, treatment, and therapeutic discovery. A single patient’s electronic healthcare record often includes multiple modalities, each providing unique insights into their condition. Yet, integrating these diverse, complementary sources to gain deeper insights remains a challenge. While deep learning has transformed single-modality analysis, many clinical scenarios, particularly in cancer care, require integrating complementary data sources for a holistic understanding. In cancer care, two key modalities provide complementary perspectives: histopathology whole-slide images (WSIs) and omics data (genomic, transcriptomic, epigenomic). WSIs deliver high-resolution views of tissue morphology and cellular structures, while omics data reveal molecular-level details of disease mechanisms. In this domain, single-modality approaches fall short: histopathology misses molecular heterogeneity, and traditional bulk or non-spatial omics data lack spatial context. Although recent advances in spatial omics technologies aim to bridge this gap by capturing molecular data within spatially resolved tissue architecture, such approaches are still emerging and are not explored in this thesis. Consequently, integrating conventional WSIs and non-spatial omics data through effective fusion strategies becomes essential for uncovering their joint potential. Effective fusion of these modalities holds the potential to reveal rich, cross-modal patterns that help identify signals associated with tumor behavior. But key questions arise: How can we effectively align these heterogeneous modalities (high-resolution images and diverse molecular data) into a unified framework? How can we leverage their interactions to maximize complementary insights? How can we tailor fusion strategies to maximize the strengths of dominant modalities across diverse clinical tasks? This thesis tackles these questions head-on, advancing integrative healthcare by developing novel deep multi-modal fusion methods. Our primary focus is on integrating the aforementioned key modalities, proposing innovative approaches to enhance omics–WSI fusion in cancer research. While the downstream applications of these methods span diagnosis, prognosis, and treatment stratification, the core contribution lies in the design and evaluation of fusion strategies that effectively harness the complementary strengths of each modality. Our research develops a multi-modal fusion method to enhance cross-modality interactions between WSIs and omics data, using advanced architectures to integrate their heterogeneous feature spaces and produce discriminative representations that improve cancer grading accuracy. These methods are flexibly designed and can be applied to fuse data from diverse sources across various application domains; however, this thesis focuses primarily on cancer-related tasks. We also introduce cross-modal attention mechanisms to refine feature representation and interpretability, functioning effectively in both single-modality and bimodal settings, with applications in breast cancer classification (using mammography, MRI, and clinical metadata) and brain tumor grading (using WSIs and gene expression data). Additionally, we propose dual fusion strategies combining early and late fusion to address challenges in omics-WSI integration, such as explainability and high-dimensional omics data, aligning omics with localized WSI regions to identify tumor subtypes without patch-level labels, and capturing global interactions for a holistic perspective. We deliver three key contributions: the Multi-modal Outer Arithmetic Block (MOAB), a novel fusion method integrating latent representations from WSIs and omics data using arithmetic operations and a channel fusion technique, achieving state-of-the-art brain cancer grading performance with publicly available code; the Flattened Outer Arithmetic Attention (FOAA), an attention-based framework extending MOAB for single- and bimodal tasks, surpassing existing methods in breast and brain tumor classification; and the Multi-modal Outer Arithmetic Block Dual Fusion Network (MOAD-FNet), combining early and late fusion for explainable omics-WSI integration, outperforming benchmarks on The Cancer Genome Atlas (TCGA) and NHNN BRAIN UK datasets with interpretable WSI heatmaps aligned with expert diagnoses. Together, these contributions provide reliable, interpretable, and adaptable solutions for the multi-modal fusion domain, with a specific focus on advancing diagnostics, prognosis, and personalized healthcare strategies while addressing the critical questions driving this field forward.

Description

Keywords

Multimodal Fusion, Deep Learning, Whole-Slide Images, Omics Data, DNA Methylation, Digital Pathology, Multi-omics Integration, Cancer Diagnosis, Cancer Prognosis, Precision Oncology, Attention-based Models, Representation Learning.

Citation

https://qmro.qmul.ac.uk/xmlui/handle/123456789/113377

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2026