Improving predictions of pathogenicity of CLCN variants using paralogue annotation
Date
2023-12-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
With the large expansion of genomic data, providing a genetic diagnosis has become a demanding task. Experimental approaches require funding, time, and expertise, establishing a bottleneck in classification of a variant and genetic diagnosis. This is particularly true in the genetic service environment, where family data is often not available. Current computational tools are too inaccurate in providing reliable predictions of pathogenicity, tackling biophysical changes, and providing precision-therapy solutions. In this project, I attempt to develop a framework for the CIC family based on paralogue annotation to improve and support pathogenicity predictions. Paralogue annotation consists of annotating paralogous genes, which are genes that have arisen from gene duplication. Paralogous genes are highly homologous, thought to have originated from gene duplication and have developed specialized functions in different tissues. Paralogue annotation is based on the assumption that excessively conserved residues are often vital to protein function. The CIC family of genes is subdivided into channels and transporters, and the validity of paralogue annotation across the family has not been established.
By assessing sequence and structural alignment, I established core regions across the ClC-paralogues amenable for paralogue annotation, with 10% fully conserved and 26% partially conserved residues, and with full alignment for secondary structure elements in transmembrane and intracellular domains. Conservation was much higher among the channel and transporter subfamilies. Analysis of conserved residues indicated that they have a higher burden of pathogenic variants, as defined by their appearance in the human genetic mutation database (HGMD). The variants in HGMD were further manually annotated for pathogenicity. I further used the ratio of variants found in HGMD and of variants found in the general population (gnomAD) to establish regions of high and low impact within channel structures. High-impact regions include the subunit interface of the ClC dimer and sequences close to the chloride ion permeation/transport pathway. Low-impact regions are peripheral, distant from the subunit interface. These data establish the groundwork for paralogue annotation of the ClC family.
I further contributed by functionally analysing variants identified in patients either at UCL or in ClinVar as variants of uncertain significance, and variants affecting residues with paralogue annotations, including the I238F, L497R, and V605A variants identified using paralogue annotation of ClinVar variants.
Description
Keywords
Neuroscience, Genetics, Electrophysiology, bioinformatics, chloride channels