Carneiro, GustavoKutbi, Jad2024-12-102024-09https://hdl.handle.net/20.500.14154/74073This dissertation explores the use of advanced deep learning architectures, foundation models, and hybrid models for medical image classification. Medical imaging plays a critical role in the healthcare industry, and deep learning models have demonstrated significant potential in improving the accuracy and efficiency of diagnostic processes. This work focuses on three datasets: RetinaMNIST, BreastMNIST, and FractureMNIST3D from the MedMNISTv2 datasets, each representing different imaging modalities and classification tasks. The significance of this work lies in its comprehensive evaluation of state-of-the-art models, including ResNet, Vision Transformers (ViT), ConvNeXt, and Swin Transformers, and their effectiveness in handling complex medical images. The primary contributions of this research are the implementation and benchmarking of modern architectures on these datasets, as well as the investigation of hyperparameter optimization using Optuna. Pretrained models and hybrid architectures such as CNN-ViT, SwinConvNeXt and CNN-LSTM were explored to enhance performance. Key results demonstrate that models like ConvNeXt-tiny (pretrained) and CLIP achieved high accuracy and AUC scores, particularly in BreastMNIST and RetinaMNIST datasets, setting new performance benchmarks. The combination of Swin and ConvNeXt using feature fusion was shown to improve model robustness, especially when handling multi-class and 3D data.48enMedical image classificationViTCLIPFine-tuningCNNLSTMConvNextSwinOptunaRetinaMNISTBreastMNISTFractureMNIST3DMedMNISTExploring Advanced Deep Learning, foundation and Hybrid models for Medical Image ClassificationThesis