A PYTHON TOOL FOR INTERROGATING POTENTIAL CONFOUNDING OF GWAS RESULTS OWING TO POPULATION STRATIFICATION

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

This project develops a Python-based tool to detect residual population stratification in genome-wide association study (GWAS) summary statistics using publicly available reference data from the 1000 Genomes Project. The tool implements two complementary functions: a heatmap that visualises potential directional bias in effect sizes across allele frequency difference bins between GBR and TSI populations, and a regression framework that quantifies the variance in GWAS effect sizes explained by principal component loadings derived from LD-pruned reference genotypes. Applied to a demonstration GWAS of adult height, the pipeline reveals ancestry-related structure detectable even after standard PCA adjustment and provides a rapid, reproducible layer of post-GWAS quality control to support more robust and equitable genetic association analyses in health data science.

Description

This Master’s thesis in Health Data Science presents a reproducible Python pipeline for post-GWAS quality control, illustrating how open reference panels and summary statistics can be combined to diagnose ancestry-driven confounding before downstream applications such as polygenic risk scoring and translational genomic research.

Keywords

Health data science, Genome-wide association studies (GWAS), Population stratification, Genetic epidemiology, Polygenic risk scores, Quality control, Python tools for genomics.

Citation

Bin Sebayel, N. (2025). A Python tool for interrogating potential confounding of GWAS results owing to population stratification [Master’s thesis, University of Exeter]. Saudi Digital Library.

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2026