Multicomponent Complexes, Structural Proteomes, Drug Discovery For Cancer Gene Census And SARS CoV-2
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Nowadays, with the next-generation sequencing technique, there is abundant sequence data for humans, bacteria, and viruses compared to structural data that provides a deep understanding of function. Since experimental structural biology is expensive and time-consuming, a protein modelling algorithm such as a MODELLER can reduce the ample space between the sequence and structural annotation in less time. Thus, solving protein structures with reliable or high accuracy quality assessment can be achieved computationally.
This is a step towards an era of personalised medicine since most drug optimisation, such as selectivity and potency, is structurally based. Proteins biologically can occur as protomers or assemble as higher order states as homo- or hetero-oligomers. Furthermore, protomers can interact with other assemblies forming multi-component- complex systems. One of the challenges in computational protein modelling is to predict a particular biological state of a given protein, also modelling that state in a high order assembly that mimics the actual biological assembly. There are two main drivers for computational structural modelling: first, to reduce the colossal sequence- structure gap, and second, to understand the impact of mutations in human cancer and new variants from viruses on the protein structure. The Catalogue of Somatic Mutations in Cancer (COSMIC) curates vast amounts of human mutation data. There are 723 genes in the Cancer Gene Census; these genes are experimentally validated as drivers of cancer progression and proliferation. Unfortunately, there are only 87 genes with experimentally solved structures of gene products with more than 90% structural coverage, whereas the protein structures related to other genes are still not solved or partially solved. A comprehensive state of the art computational structure modelling effort has been carried out to build these genes with high order assembly, i.e. homodimer, heterodimer including ligand, DNA, RNA, and intrinsically disordered regions connecting domains. In addition, predictions of the impacts of reported mutations in the COSMIC database using statistical and machine learning algorithms such as SDM and mCSM can be used to hypothesise new driver mutations with structural impact. All these data are presented in a user-friendly interface (https://cancer-3d.com/) where users can retrieve and build hypotheses. Applying the same modelling approaches to another acute infectious disease such as severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) can be beneficial to our understanding of the virus proteome and selecting a new drug target. The SARS CoV-2 sequence genome was released in early 2020, and full protomer and oligomeric structures were built where there were no experimental structures. Pocket detection, mutational analysis, protein-ligand and protein-protein docking were computationally performed to gain more insights into putative SARS CoV-2 drug targets. All this information is presented in a new user-friendly interface (https://sars3d.com/) that can be accessed freely to build hypotheses and download the data. Experimental validation of the impacts of mutations and validation of drug discovery targets is essential to assess the computational approaches that have been carried out for human cancer gene census and SARS CoV-2 and described in this thesis. Therefore, GTPase NRAS frequently reported mutants such as Q61K/L, G12D, and G13D were studied experimentally to understand the impacts of these mutations on protein structural conformation and function. In addition, hypotheses are presented concerning newly identified allosteric pockets that could be used to disrupt continuously active NRAS. The SARS-CoV-2 Non-structural protein 13 (nsp13) was selected as a drug discovery target to develop a new putative lead compound using fragment-based approaches.
