Scalable Protein Stability Prediction via Generative Models

Original Title: Generalizable and scalable protein stability prediction with rewired protein generative models

Journal: Nature communications

Overview

Protein stability, typically measured by changes in Gibbs free energy (ΔΔG), is a fundamental property that dictates protein function and engineering potential. Accurately predicting how mutations influence this stability remains a significant challenge due to the scarcity of high-quality experimental data and the intricate nature of three-dimensional molecular interactions. This research introduces SPURS, a deep learning framework designed to address these limitations by integrating two distinct types of protein generative models. Specifically, it combines the evolutionary patterns captured by the protein language model ESM2 with the geometric constraints learned by the inverse folding model ProteinMPNN. By training on a filtered Megascale dataset containing over 270,000 measurements across nearly 300 proteins, the framework achieves a median Spearman correlation of 0.83, outperforming existing benchmarks. The system is designed for high efficiency, allowing the prediction of all possible single-point mutations for a protein in a single computational pass.

Novelty

The primary innovation of this work is the rewiring strategy that connects sequence-based and structure-based models through a lightweight adapter module. Unlike previous methods that simply concatenate model outputs or fine-tune entire massive architectures, SPURS updates only 9.9 million parameters, which is approximately 1.5% of the total parameter count. This parameter-efficient approach preserves the pre-trained evolutionary knowledge while specializing the model for stability prediction. Furthermore, the architecture introduces an all-mutants-per-pass inference paradigm, reducing the computational cost from a linear scale to a constant scale relative to the number of possible mutations. The framework also extends beyond single-point substitutions by incorporating a dedicated epistasis decoder. This component specifically models non-additive effects between multiple amino acid substitutions, allowing the system to accurately predict the stability of higher-order mutants that many existing additive models fail to capture.

Potential Clinical / Research Applications

This framework has significant implications for clinical research and protein engineering. In clinical settings, it enables the systematic analysis of the human proteome to distinguish between benign and pathogenic missense variants. The researchers demonstrated that 68% of pathogenic variants are significantly destabilizing, compared to only 19% of benign ones, providing a quantitative tool for disease variant interpretation. In the field of protein engineering, the high efficiency of the model supports large-scale screening for stabilizing mutations, a task previously hindered by computational bottlenecks. Additionally, by serving as an informative prior, SPURS enhances fitness prediction in low-N scenarios where experimental data is extremely limited. This capability is particularly useful for optimizing therapeutic proteins or industrial enzymes where only a few dozen variants can be tested in a laboratory setting. The model can also assist in identifying functional hotspots, such as binding interfaces and active sites, by detecting residues where mutations cause functional loss out of proportion to stability changes.

Scalable Protein Stability Prediction via Generative Models

Overview

Novelty

Potential Clinical / Research Applications

Comments

Leave a Reply Cancel reply