A PYTHON TOOL FOR INTERROGATING POTENTIAL CONFOUNDING OF GWAS RESULTS OWING TO POPULATION STRATIFICATION
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
This project develops a Python-based tool to detect residual population stratification in genome-wide association study (GWAS) summary statistics using publicly available reference data from the 1000 Genomes Project. The tool implements two complementary functions: a heatmap that visualises potential directional bias in effect sizes across allele frequency difference bins between GBR and TSI populations, and a regression framework that quantifies the variance in GWAS effect sizes explained by principal component loadings derived from LD-pruned reference genotypes. Applied to a demonstration GWAS of adult height, the pipeline reveals ancestry-related structure detectable even after standard PCA adjustment and provides a rapid, reproducible layer of post-GWAS quality control to support more robust and equitable genetic association analyses in health data science.
Description
This Master’s thesis in Health Data Science presents a reproducible Python pipeline for post-GWAS quality control, illustrating how open reference panels and summary statistics can be combined to diagnose ancestry-driven confounding before downstream applications such as polygenic risk scoring and translational genomic research.
Keywords
Health data science, Genome-wide association studies (GWAS), Population stratification, Genetic epidemiology, Polygenic risk scores, Quality control, Python tools for genomics.
Citation
Bin Sebayel, N. (2025). A Python tool for interrogating potential confounding of GWAS results owing to population stratification [Master’s thesis, University of Exeter]. Saudi Digital Library.
