Features Selection Strategies for Classifying Heterogeneous Cardiovascular Disease Data

No Thumbnail Available

Date

2025-05

Journal Title

Journal ISSN

Volume Title

Publisher

University of Liverpool

Abstract

The exponential growth of data across diverse domains has presented significant challenges, particularly in the integration and analysis of multi-modal datasets. This thesis addresses these challenges by proposing a Homogeneous Feature Vector Representation (HFVR) framework, designed to unify disparate data formats and facilitate holistic machine learning. The primary application of this research is the classification of Cardiovascular Disease (CVD), using data from multiple sources, including time series, images, video, and clinical records. The main research question guiding this thesis is: How can multiple heterogeneous data sources be effectively combined to support comprehensive and integrated machine learning analysis? The contributions of this work are twofold: (i) the development of technical methodologies for feature extraction, and (ii) the application of these methods within a medical context to enhance CVD diagnosis and prognosis. Five feature extraction techniques are presented to address the complexities of multi-modal data integration: 1-1D Motifs and Discords (1D-MD): This technique uses matrix profiles to extract recurring patterns (motifs) and anomalies (discords) from ECG time series data. It serves as a benchmark for classification models. 2-2D Motifs and Discords (2D-MD): This technique operates directly on ECG images to extract spatial motifs and discords, offering improvements in classification performance over 1D-MD. 3-2D Convolutional Neural Networks (2D-CNN): Pre-trained CNNs like ResNet-50 and VGG16 are used to extract hierarchical features from ECG images, significantly improving classification accuracy when integrated into the HFVR framework. 4-Multi-Frame 2D CNN (MF2D-CNN): Designed for video data, this method processes frames using CNNs and applies temporal aggregation to capture dynamic patterns while maintaining computational efficiency. 5-Spatio-Temporal 3D CNN (ST3D-CNN): Building on MF2D-CNN, this approach uses 3D convolutions to jointly analyze spatial and temporal dynamics in Echo video data. The research also includes the development of a bespoke, multi-modal dataset in collaboration with the Liverpool Heart and Chest Hospital (LHCH), combining ECG, Echo, and clinical data. This dataset was used to evaluate the proposed methods and demonstrate their real-world applicability. The HFVR framework outperformed single-modality approaches by integrating features from multiple data sources. Extensive experiments were conducted on public datasets (CPSC, GHS, GAF, eCAN, and dCAN) and the LHCH dataset. Evaluation metrics such as Accuracy, Precision, Recall, F1-score, and AUC were used, with ten-fold cross-validation and stratified sampling ensuring robustness. Traditional classifiers like Support Vector Machines (SVMs) and k-Nearest Neighbour (kNN) were also used to validate the HFVR framework. Results showed that combining MF2D-CNN with clinical and 2D-CNN features achieved the highest AUC of 93.3%, significantly outperforming baseline methods. Statistical analysis confirmed the robustness and scalability of the techniques. Overall, this thesis advances multi-modal data integration by presenting a unified framework for feature extraction and fusion. The HFVR framework paves the way for holistic machine learning and improved predictive accuracy. While focused on CVD classification, the techniques are generalizable to other domains. Future work will explore real-time implementation, enhanced extraction methods, and expansion to additional data types and domains, representing a major step toward scalable, integrated machine learning systems.

Description

Keywords

multi-modal integration, feature extraction, HFVR, CVD classification, deep learning, ECG, Echo video, 1D motifs, 2D CNN, 3D CNN

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025