Mutation Identification and Filtering in Colorectal Cancer Sequencing Data: Improving Accuracy and Reliability
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Leeds
Abstract
Using colorectal cancer (CRC) sequencing data in this study, we aimed to enhance the accuracy and reliability of mutation identification. This research focused on developing a comprehensive pipeline for detecting mutations in CRC due to the heterogeneity of the disease and technical artifacts that can be encountered during sequencing.
To distinguish genuine mutations from artifacts, the study employed advanced filtering strategies to analyze sequence data from CRC samples processed with different sequencing technologies. To extract mutation data from BAM and VCF files, specific filters were applied based on sequencing depth, mutation frequency, and cross-referencing with COSMIC and gnomAD databases.
To assess the significance of mutations, Fisher's Exact Test and Benjamini-Hochberg correction were used. The study identified 4,089 significant mutations after initial filtering, with 683 mutations shared between COSMIC and gnomAD databases, suggesting they are genuine cancer-related variants.
The most significant genes were TP53, PDGFRA, DMD, and PIK3CA, with the most significant mutations occurring in TP53. The most frequent mutation consequences observed included missense variants, stop-gained mutations, and synonymous variants, with missense variants being predominant. Compared with gnomAD and COSMIC data, most mutations were categorized as probable single nucleotide polymorphisms (SNPs), with a substantial number as probable cancer mutation. Several assays were flagged as potential issues due to inconsistencies across different experimental conditions for a smaller subset.
This study has highlighted the importance of ongoing validation to mitigate technical artifacts, as well as identifying key CRC-related mutations and their consequences. As a result of the findings, precision oncology is improved, thereby facilitating more personalized treatment strategies for CRC patients.
Description
Keywords
Colorectal cancer (CRC), Mutation detection, Sequencing data, Filtering strategies, Mutation frequency, Significant mutations, Cancer mutations, Precision oncology, Personalized treatment