Early Detection of Pleuropulmonary Blastoma Using Transformers Models
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Bowie State University
Abstract
Childhood cancer is the second leading cause of death among children under the age of fifteen, according to the American Cancer Society. The number of diagnosed cancer cases in children continues to rise each year, leading to many tragic fatalities. One specific type of cancer, pleuropulmonary blastoma (PPB), affects children from newborns to those as old as six years. The most common way to diagnose PPB is through imaging; this method is quick, cost-effective, and does not require specialized equipment or laboratory tests. However, relying solely on imaging for early detection of PPB can be challenging because of lower accuracy and sensitivity. It is time consuming and susceptible to errors because of the numerous potential differential diagnoses. A more accurate diagnosis of PPB depends on identifying mutations in the DICER1 gene. Recent advancements in biological analysis and computer learning are transforming cancer treatment. Deep learning (DL) methods for diagnosing PPB are becoming increasingly popular. Despite facing some challenges, DL shows a significant promise in supporting oncologists. However, some advanced models possess a limited local receptive field, which may restrict their ability to comprehend the overall context. This research employs the vision transformer (ViT) model to address these limitations. ViT reduces computation time and yields better results than existing models. It utilizes self-attention among image patches to process visual information effectively. The experiments in this study are conducted using two types of datasets, medical images and genomic datasets, employing two different methodologies. One approach uses the ViT model combined with an explainability framework on large medical image datasets with various modalities. The other involves developing a new hybrid model that integrates the vision transformer with bidirectional long short-term memory (ViT-BiLSTM) for genomic datasets. The results demonstrate that the ViT model and the new hybrid model, ViT-BiLSTM, significantly outperform established models, as validated by multiple performance metrics. Consequently, this research holds great promise for the early diagnosis of PPB, reducing misdiagnosis occurrences, and facilitating timely intervention and treatment. These findings could revolutionize medical diagnosis and shape the future of healthcare.
Description
This dissertation's research addresses the late-stage detection of Pleuropulmonary blastoma (PPB). The study uses imaging datasets with rich radiomic features and genomic data to explore the potential of the ViT architecture in effectively classifying medical images, specifically in detecting PPB. The process involves several essential steps demonstrating ViT models' adaptability and ability to perform well in specialized tasks. Intelligent data preprocessing converts raw DICOM medical images into standardized JPG files, enabling the ViT model to process and interpret the data effectively. The model implementation involves using a pre-trained ViT network, emphasizing the effectiveness of transfer learning in medical applications with limited training data. The model demonstrates outstanding performance, achieving a test accuracy exceeding 99% after 50 epochs, with an F1 score of 0.9967, sensitivity of 99.9%, specificity of 97.18%, overall accuracy of 99.85%, precision of 99.94%, balanced accuracy of 98.54%, and an overall performance model AUC of 0.99. This specialized diagnostic tool excels in the crucial task of identifying positive PPB diagnoses, offering substantial value. The study emphasizes how modern deep learning approaches like ViT can deliver robust performance in addressing specific medical challenges through meticulous implementation. The ViT architecture shows promise as the basis for a decision support system to assist clinicians in PPB detection, with the potential for significant clinical impact and improved patient outcomes through further refinement and additional data. The manuscript explores the precise application of saliency analysis techniques to thoroughly evaluate the performance of advanced deep-learning networks in effectively classifying lung neoplasms into binary categories.
The ViT model's intricate computations can challenge doctors, radiologists, and scientists to understand its predictions. To address this, the potential XAI techniques to enhance the model's predictability are promising. One such method involves using saliency maps to highlight the most crucial features in the input data, thereby clarifying the model's predictions. These comprehensive visualizations will improve patient treatment and expedite the detection of cancerous areas. Highlighting the critical importance of Explainable Artificial Intelligence (XAI) in the diagnosis of opaque deep learning models, particularly in medical imaging, the paper seeks to quantitatively evaluate the performance of widely used saliency XAI methods based on ViT networks for PPB detection in diverse medical images. The study comprehensively scrutinizes the most utilized advanced deep learning networks, specifically focusing on the ViT decisions with explainability. An additional network is introduced to analyze the acquired saliency regions in this context. This network plays a crucial role in interpreting the saliency maps produced by the XAI methods, providing a clearer understanding of the model's decision-making process. Saliency XAI methods, including using of Gradient-weighted Class Activation Mapping (Grad-CAM) and Eigen-CAM, have enabled us to produce visual saliency maps, which pinpoint critical regions in digital test set images where the applied models concentrate their attention when predicting cancer subtypes. These maps have the potential to simplify and clarify the results of cancer detection, aiding pathologists in crafting more effective treatment plans. Using this method in medical environments for completely automated identification of lung cancer from medical images could offer transparent and easy-to-understand outcomes, fostering trust in the procedure.
This work utilized two visualization techniques to evaluate the ViT's performance for binary mask prediction: Grad-CAM and Eigen-CAM. Their performance was assessed using metrics such as intersection over Union (IoU) and Confusion-Averaged Dice (CAD) Score. A pivotal aspect of the model evaluation process involved meticulously defining a reshaped transform function. This function is crucial in ensuring the accurate initialization of Grad-CAM and Eigen-CAM with the trained model and target layer. While IoU measures the overlap between predicted binary and ground truth masks, the CAD score evaluates the inclusiveness and exclusivity of the XAI method's outputs. The CAD score provides a balanced view of the XAI methods' positive and negative classification capabilities, making it a valuable metric for ensuring precise and reliable visualizations. The IoU and CAD score for each technique, Grad-CAM, and Eigen-CAM, was calculated for each image, mask, and label in the data loader. Grad-CAM exhibited deficient performance, with an average CAD score of 54% compared to the 6.7% IoU score. These scores suggest that Grad-CAM is not effectively identifying relevant regions in the images, with results resembling random guessing in distinguishing relevant from irrelevant areas. In contrast, Eigen-CAM significantly outperformed Grad-CAM, with a higher average IoU of 12% and a CAD score of 62%. Therefore, these results indicate that Eigen-CAM provides better overlap with ground truth regions and more accurately highlights areas relevant to the model's decisions.
This study developed a hybrid advanced deep learning model by combining vision transformer (ViT) and Bidirectional long short-term memory (Bi-LSTM) networks to classify mutations in the DICER1 gene. A comprehensive preprocessing pipeline, including data augmentation and feature engineering, was implemented, and this research trained the model on a diverse and representative dataset. The impressive results demonstrate the model's success, achieving an accuracy of 0.99 and precision and F1 scores of 0.99. The ROC-AUC score of 0.99 further emphasizes the model's effectiveness in distinguishing between multi-class classifications of DICER1 mutation. The Matthews correlation coefficient (MCC) of 0.999 indicates a strong correlation between the predicted and true labels. Visualizations such as the confusion matrix and ROC-AUC plot supported the numerical results and confirmed the model's reliability and performance.
Furthermore, the predictions for individual patient cases showcased the model's potential for real-world applications, particularly in genetic risk assessment and personalized medicine. These results indicate that the model could be a valuable tool in clinical settings, assisting in identifying and treating genetic disorders related to DICER1 mutations. These findings underscore the power of advanced deep-learning techniques in advancing the understanding of genetic disorders associated with DICER1 mutations.
Furthermore, conducting additional research on model optimization and integrating radiomic/genomic data through neural architecture search could significantly enhance practical clinical decision support systems' sensitivity, specificity, and efficiency. Future research could focus on refining the model and exploring its applicability to other genetic variations and medical imaging, aiming to improve patient outcomes and develop targeted treatment strategies. Additionally, this research is intends to employ multimodal integration (MMI) to predict gene states by inputting diverse data types, including genomics, radiology, and pathology. The MMI systems will then generate predictions of gene status and conduct further prognostic evaluations and correlation exploration.
Keywords
Vision Transformer, Deep Learning, DICER1 Sarcoma, pleuropulmonary blastoma (PPB)
Citation
IEEE