Exploring Advanced Deep Learning, foundation and Hybrid models for Medical Image Classification
No Thumbnail Available
Date
2024-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Surrey
Abstract
This dissertation explores the use of advanced deep learning architectures, foundation models, and
hybrid models for medical image classification. Medical imaging plays a critical role in the healthcare
industry, and deep learning models have demonstrated significant potential in improving the
accuracy and efficiency of diagnostic processes. This work focuses on three datasets: RetinaMNIST,
BreastMNIST, and FractureMNIST3D from the MedMNISTv2 datasets, each representing
different imaging modalities and classification tasks. The significance of this work lies in its comprehensive
evaluation of state-of-the-art models, including ResNet, Vision Transformers (ViT),
ConvNeXt, and Swin Transformers, and their effectiveness in handling complex medical images.
The primary contributions of this research are the implementation and benchmarking of modern
architectures on these datasets, as well as the investigation of hyperparameter optimization
using Optuna. Pretrained models and hybrid architectures such as CNN-ViT, SwinConvNeXt
and CNN-LSTM were explored to enhance performance. Key results demonstrate that models
like ConvNeXt-tiny (pretrained) and CLIP achieved high accuracy and AUC scores, particularly
in BreastMNIST and RetinaMNIST datasets, setting new performance benchmarks. The combination
of Swin and ConvNeXt using feature fusion was shown to improve model robustness,
especially when handling multi-class and 3D data.
Description
Keywords
Medical image classification, ViT, CLIP, Fine-tuning, CNN, LSTM, ConvNext, Swin, Optuna, RetinaMNIST, BreastMNIST, FractureMNIST3D, MedMNIST