Mutation Identification and Filtering in Colorectal Cancer Sequencing Data: Improving Accuracy and Reliability

dc.contributor.advisorWood, Henry
dc.contributor.advisorGusnanto, Arief
dc.contributor.authorAlharbi, Rahaf
dc.date.accessioned2024-12-24T07:58:51Z
dc.date.issued2024
dc.description.abstractUsing colorectal cancer (CRC) sequencing data in this study, we aimed to enhance the accuracy and reliability of mutation identification. This research focused on developing a comprehensive pipeline for detecting mutations in CRC due to the heterogeneity of the disease and technical artifacts that can be encountered during sequencing. To distinguish genuine mutations from artifacts, the study employed advanced filtering strategies to analyze sequence data from CRC samples processed with different sequencing technologies. To extract mutation data from BAM and VCF files, specific filters were applied based on sequencing depth, mutation frequency, and cross-referencing with COSMIC and gnomAD databases. To assess the significance of mutations, Fisher's Exact Test and Benjamini-Hochberg correction were used. The study identified 4,089 significant mutations after initial filtering, with 683 mutations shared between COSMIC and gnomAD databases, suggesting they are genuine cancer-related variants. The most significant genes were TP53, PDGFRA, DMD, and PIK3CA, with the most significant mutations occurring in TP53. The most frequent mutation consequences observed included missense variants, stop-gained mutations, and synonymous variants, with missense variants being predominant. Compared with gnomAD and COSMIC data, most mutations were categorized as probable single nucleotide polymorphisms (SNPs), with a substantial number as probable cancer mutation. Several assays were flagged as potential issues due to inconsistencies across different experimental conditions for a smaller subset. This study has highlighted the importance of ongoing validation to mitigate technical artifacts, as well as identifying key CRC-related mutations and their consequences. As a result of the findings, precision oncology is improved, thereby facilitating more personalized treatment strategies for CRC patients.
dc.format.extent15
dc.identifier.urihttps://hdl.handle.net/20.500.14154/74432
dc.language.isoen
dc.publisherUniversity of Leeds
dc.subjectColorectal cancer (CRC)
dc.subjectMutation detection
dc.subjectSequencing data
dc.subjectFiltering strategies
dc.subjectMutation frequency
dc.subjectSignificant mutations
dc.subjectCancer mutations
dc.subjectPrecision oncology
dc.subjectPersonalized treatment
dc.titleMutation Identification and Filtering in Colorectal Cancer Sequencing Data: Improving Accuracy and Reliability
dc.typeThesis
sdl.degree.departmentSchool of Molecular medicine, Faculty of Biological science
sdl.degree.disciplinePrecision Medicine: Genomics & Analytics
sdl.degree.grantorUniversity of Leeds
sdl.degree.nameMaster of science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025