Predicting One-Dimensional Protein Structures by Leveraging Pre-Trained Language Models (PLMs) and Deep Learning

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Proteins are essential biomolecules, and their function is intrinsically linked to their structure. Understanding this relationship is crucial for advancements in molecular biology, medicine, and biotechnology. Despite the rapid growth in sequencing data, structural data remains sparse due to the challenges and costs associated with experimental methods. As a result, computational protein structure prediction has become essential in bridging the gap between sequence data and structural understanding. This thesis focuses on advancing one-dimensional (1D) structural annotations—specifically secondary structure (SS) and relative solvent accessibility (RSA)—by leveraging state-of-the art deep learning methodologies. Two novel prediction tools are introduced: Porter6 for SS prediction and PaleAle6.0 for RSA prediction. Both models utilize pre-trained protein language models (PLMs) and a convolutional bidirectional recurrent neural network (CBRNN) architecture, enabling high-accuracy predictions without relying on multiple sequence alignments. PaleAle6.0 further supports real-valued, binary, and multi-class RSA outputs, offering enhanced flexibility and performance. To promote accessibility and usability, these tools are made available to the research community through DeepPredict, a web-based platform designed for efficient and scalable structural predictions. DeepPredict enables users to perform accurate SS and RSA predictions with minimal computational requirements. This thesis presents a comprehensive evaluation of PLM-based embeddings, highlights the importance of careful dataset design to avoid bias and overfitting, and promotes realistic evaluation metrics that consider evolutionary relationships between proteins. With its powerful prediction capabilities and user-friendly design, DeepPredict supports a wide range of applications in drug discovery, synthetic biology, and the understanding of disease mechanisms, laying a strong foundation for future advancements in computational biology.

Description

Keywords

Deep learning 1D protein prediction Protein databases Secondary structure Intrinsic disorder Solvent accessibility AlphaFold Protein language models

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025