Scalable Protein Stability Prediction via Generative Models

Original Title: Generalizable and scalable protein stability prediction with rewired protein generative models

Journal: Nature communications

DOI: 10.1038/s41467-025-67609-4

Overview

Protein stability, typically measured by changes in Gibbs free energy (ΔΔG), is a fundamental property that dictates protein function and engineering potential. Accurately predicting how mutations influence this stability remains a significant challenge due to the scarcity of high-quality experimental data and the intricate nature of three-dimensional molecular interactions. This research introduces SPURS, a deep learning framework designed to address these limitations by integrating two distinct types of protein generative models. Specifically, it combines the evolutionary patterns captured by the protein language model ESM2 with the geometric constraints learned by the inverse folding model ProteinMPNN. By training on a filtered Megascale dataset containing over 270,000 measurements across nearly 300 proteins, the framework achieves a median Spearman correlation of 0.83, outperforming existing benchmarks. The system is designed for high efficiency, allowing the prediction of all possible single-point mutations for a protein in a single computational pass.

Novelty

The primary innovation of this work is the rewiring strategy that connects sequence-based and structure-based models through a lightweight adapter module. Unlike previous methods that simply concatenate model outputs or fine-tune entire massive architectures, SPURS updates only 9.9 million parameters, which is approximately 1.5% of the total parameter count. This parameter-efficient approach preserves the pre-trained evolutionary knowledge while specializing the model for stability prediction. Furthermore, the architecture introduces an all-mutants-per-pass inference paradigm, reducing the computational cost from a linear scale to a constant scale relative to the number of possible mutations. The framework also extends beyond single-point substitutions by incorporating a dedicated epistasis decoder. This component specifically models non-additive effects between multiple amino acid substitutions, allowing the system to accurately predict the stability of higher-order mutants that many existing additive models fail to capture.

Potential Clinical / Research Applications

This framework has significant implications for clinical research and protein engineering. In clinical settings, it enables the systematic analysis of the human proteome to distinguish between benign and pathogenic missense variants. The researchers demonstrated that 68% of pathogenic variants are significantly destabilizing, compared to only 19% of benign ones, providing a quantitative tool for disease variant interpretation. In the field of protein engineering, the high efficiency of the model supports large-scale screening for stabilizing mutations, a task previously hindered by computational bottlenecks. Additionally, by serving as an informative prior, SPURS enhances fitness prediction in low-N scenarios where experimental data is extremely limited. This capability is particularly useful for optimizing therapeutic proteins or industrial enzymes where only a few dozen variants can be tested in a laboratory setting. The model can also assist in identifying functional hotspots, such as binding interfaces and active sites, by detecting residues where mutations cause functional loss out of proportion to stability changes.

Similar Posts

  • Assessing ChatGPT in Diagnosing Degenerative Diseases

    Original Title: Clinical Manifestations Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70857_101996 Overview This study evaluates the clinical performance of ChatGPT version 3.5 in diagnosing neurodegenerative diseases. Building on previous research where the model achieved a 45.1% accuracy rate on neurology residency exams, this investigation uses nine case reports from the journal Dementia and Neurocognitive Disorders. The methodology involved a two-stage interaction to simulate the diagnostic process. First, the model received patient symptoms, medical histories, and physical findings to generate differential diagnoses and suggest diagnostic procedures. Second, specific laboratory and imaging results were provided to determine the final diagnosis. This approach assesses how the model…

  • Great debate: artificial intelligence will replace much of what cardiologists do

    Title AI in Cardiology: A Tool, Not a Replacement One-Sentence Summary This paper debates the extent to which artificial intelligence will substitute for cardiologists, presenting arguments that AI will enhance many tasks but cannot replace the essential human elements of clinical judgment, accountability, and the physician-patient relationship. Overview The paper presents a balanced debate on the future role of artificial intelligence (AI) in cardiology. The “pro” argument suggests that AI’s capabilities in medical education, diagnostic imaging, and personalized care are advancing rapidly and could surpass human performance in these domains. It highlights AI’s potential to automate tasks, synthesize vast amounts of data, and improve efficiency. Conversely, the “contra” argument emphasizes…

  • AI-Powered Speech Analysis for Alzheimer’s Detection

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_107467 Overview The study investigates the utility of spontaneous speech as a non-invasive biomarker for Alzheimer's disease by developing an automated analysis pipeline. Utilizing the ADReSS 2020 Challenge dataset, which comprises audio recordings from 108 training participants and 48 testing participants performing the Cookie Theft picture description task, the researchers explored the transition from raw audio to diagnostic classification. The methodology involved transcribing audio using commercial tools like OpenAI Whisper and AssemblyAI, followed by the generation of semantic vector embeddings using large language models. These embeddings were then used to train machine learning classifiers, including…

  • Identifying Rare Pathogenic Cells with GARDEN

    Original Title: Robust characterization and interpretation of rare pathogenic cell populations from spatial omics using GARDEN Journal: Nature communications DOI: 10.1038/s41467-026-68500-6 Overview Spatial omics technologies map gene expression within the architectural context of tissues. Identifying rare cell populations that drive disease remains difficult because standard clustering often overlooks these groups or misclassifies them as noise. This paper introduces GARDEN, a framework designed to detect rare pathogenic cells using graph-based anomaly detection. GARDEN models spatial transcriptomics as a graph where nodes represent cells and edges represent proximity. By training an encoder-decoder to reconstruct healthy cell features, it identifies pathogenic cells as anomalies with high reconstruction errors. In breast cancer datasets, GARDEN…

  • AI, Mobile Tech, and Social Media for Health in Africa

    Original Title: Scoping review of artificial intelligence via mobile technology and social media for health in Africa Journal: Nature communications DOI: 10.1038/s41467-025-64766-4 Overview This scoping review investigates the integration of artificial intelligence with mobile technology and social media to address health challenges in Africa. Following the PRISMA approach, researchers screened 469 articles published between 2014 and 2023, ultimately synthesizing 116 papers with a focused analysis of 29 studies. The results indicate that these digital tools are primarily utilized for infectious disease monitoring and diagnosis. Specifically, malaria was the subject of 17.2% of the studies, while COVID-19 accounted for 13.8%. Other conditions frequently studied include Ebola at 10.3%, cervical cancer at…

  • Federated Data and Sepsis Management in the EHDS

    Original Title: The next frontier in sepsis: connected ICU data for real-world clinical decision making Journal: Intensive care medicine DOI: 10.1007/s00134-025-08284-3 Overview Sepsis is a major healthcare challenge, causing one in five deaths globally and affecting approximately 49 million individuals every year. In Europe, hospital treatment costs range from 16,000 euros in France to over 27,000 euros in Greece, while follow-up care for survivors in Germany costs about 6.8 billion euros annually. Despite these high stakes, clinical data remains fragmented across local silos, hindering the development of effective decision-support tools. The European Health Data Space (EHDS) proposes a federated infrastructure to connect intensive care units across borders. This framework allows…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA