Harnessing protein language model for structure-based discovery of highly efficient and robust PET hydrolases

Title

AI-Driven Discovery of Efficient PET Hydrolases

One-Sentence Summary

This study introduces a computational pipeline using a protein language model and structure-based search to discover a novel, highly efficient, and thermostable PET hydrolase from nature.

Overview

Polyethylene terephthalate (PET) plastic waste poses a significant environmental problem. While some enzymes, known as PET hydrolases (PETases), can break down PET, their performance is often limited. This research introduces VenusMine, a computational pipeline designed to discover new and more effective PETases. The process began by using the known structure of an existing enzyme, IsPETase, as a template to search for structurally similar proteins from vast biological databases. A protein language model (PLM) was then used to screen these candidates, predicting their solubility and thermal stability. This screening narrowed the field to 34 promising proteins for laboratory testing. Biochemical validation confirmed that 14 of these candidates could degrade PET. One enzyme, named KbPETase, isolated from the bacterium Kibdelosporangium banguiense, was particularly effective. It showed a melting temperature 32°C higher than IsPETase and exhibited a 97-fold increase in degradation activity at 50°C compared to IsPETase at 30°C. Structural analysis confirmed that KbPETase has a stable and efficient architecture.

Novelty

The approach of this study differs from conventional enzyme discovery methods, which typically rely on sequence similarity. The VenusMine pipeline instead integrates a structure-based search with the predictive power of a protein language model. This combination allows for the identification of functionally related enzymes even when their amino acid sequences are not very similar; for instance, KbPETase shares only 30-50% sequence identity with previously known PETases. Using a PLM to pre-screen candidates for key properties like thermostability and solubility is also a key feature. This computational pre-screening significantly improves the efficiency of the discovery process by reducing the number of candidates that require expensive and time-consuming experimental validation, increasing the success rate of finding active enzymes.

My Perspective

In my view, this work illustrates a shift in how we can explore the vast, untapped library of proteins that nature has produced. The use of a protein language model goes beyond simple pattern matching; it leverages an AI that has learned some of the underlying “grammatical rules” of protein biology, connecting amino acid sequences to their structural and functional outcomes. This allows us to perform a kind of computational “bioprospecting” with greater precision. It is akin to searching for a specific type of tool in an immense warehouse, not by its color or brand (sequence), but by its functional shape (structure) and durability (predicted stability). This methodology enables the identification of enzymes that evolution has already optimized, which may serve as superior starting points for further engineering compared to well-studied but less robust enzymes.

Potential Clinical / Research Applications

The VenusMine pipeline itself is a versatile tool that can be adapted to search for other enzymes with industrial or therapeutic value. For research, it could be applied to discover enzymes for degrading other types of plastics, producing biofuels, or synthesizing pharmaceuticals. The discovered enzyme, KbPETase, has direct applications in industrial PET recycling. Its high thermal stability and catalytic efficiency make it a strong candidate for developing large-scale, energy-efficient plastic degradation processes. Furthermore, KbPETase provides a new molecular scaffold for protein engineers to use in directed evolution experiments, potentially leading to even more potent variants for bioremediation. While not directly clinical, the principles of using structure-based AI models to find robust proteins could be applied to the discovery or stabilization of therapeutic enzymes used in treating metabolic disorders.

Similar Posts

  • Great debate: artificial intelligence will replace much of what cardiologists do

    Title AI in Cardiology: A Tool, Not a Replacement One-Sentence Summary This paper debates the extent to which artificial intelligence will substitute for cardiologists, presenting arguments that AI will enhance many tasks but cannot replace the essential human elements of clinical judgment, accountability, and the physician-patient relationship. Overview The paper presents a balanced debate on the future role of artificial intelligence (AI) in cardiology. The “pro” argument suggests that AI’s capabilities in medical education, diagnostic imaging, and personalized care are advancing rapidly and could surpass human performance in these domains. It highlights AI’s potential to automate tasks, synthesize vast amounts of data, and improve efficiency. Conversely, the “contra” argument emphasizes…

  • Regulating ICU AI: From Narrow Tools to Generalist Systems

    Original Title: The regulation of artificial intelligence in intensive care units: from narrow tools to generalist systems Journal: NPJ digital medicine DOI: 10.1038/s41746-026-02535-3 Overview Intensive care units represent highly data-intensive environments in healthcare, requiring continuous monitoring and rapid decision-making. While artificial intelligence has been explored for decades, its formal regulation as a medical device began in 1995. By May 2025, the number of approved artificial intelligence-enabled medical devices reached 1,016 in the United States. Many of these tools are designed for narrow, single-task applications such as interpreting radiological images or predicting sepsis. The emergence of generative artificial intelligence and large language models marks a shift toward generalist systems capable of…

  • Robust CRC Diagnosis via Causal and Uncertainty-Aware AI

    Original Title: Uncertainty-aware and causal test-time adaptive foundation model for robust colorectal cancer pathology diagnosis Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02149-1 Overview Colorectal cancer remains a major global health challenge, requiring precise histopathological analysis for effective treatment. While computational pathology has advanced with the use of large-scale foundation models, these systems frequently encounter obstacles when deployed in real-world clinical settings. Key issues include domain shifts caused by variations in staining protocols and scanner hardware, as well as the tendency for models to provide overconfident yet incorrect predictions. This paper introduces UAD-FM, an uncertainty-aware and causally adaptive foundation model designed to address these limitations. The framework integrates a variational Bayesian approach…

  • Interpretable Deep Learning for Gastric Cancer T Staging

    Original Title: Interpretable deep learning for multicenter gastric cancer T staging from CT images Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02002-5 Overview Gastric cancer remains a significant global health challenge, requiring precise preoperative T staging to determine the appropriate therapeutic strategy, such as neoadjuvant chemotherapy or direct surgical intervention. Standard contrast-enhanced computed tomography is the primary tool for this evaluation, yet its accuracy often ranges between 65% and 75% due to subjective interpretation and the difficulty of identifying subtle serosal invasion. This study introduces GTRNet, an automated deep-learning framework designed to classify gastric cancer into four T stages from routine portal venous phase images. Developed using a retrospective multicenter dataset of…

  • AI-Powered Speech Analysis for Alzheimer’s Detection

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_107467 Overview The study investigates the utility of spontaneous speech as a non-invasive biomarker for Alzheimer's disease by developing an automated analysis pipeline. Utilizing the ADReSS 2020 Challenge dataset, which comprises audio recordings from 108 training participants and 48 testing participants performing the Cookie Theft picture description task, the researchers explored the transition from raw audio to diagnostic classification. The methodology involved transcribing audio using commercial tools like OpenAI Whisper and AssemblyAI, followed by the generation of semantic vector embeddings using large language models. These embeddings were then used to train machine learning classifiers, including…

  • A metasurface combined with a neural network enables simultaneous detection of frequency, polarization, and intensity for broadband terahertz light.

    Original Title: Deep learning-enabled ultra-broadband terahertz high-dimensional photodetector Journal: Nature communications DOI: 10.1038/s41467-025-63364-8 A Deep Learning-Powered THz Photodetector Overview Light carries information in multiple forms, including its intensity, frequency (color), and polarization. Conventional photodetectors typically measure only a subset of these properties, limiting our ability to fully characterize a light field. This paper introduces a compact photodetector that overcomes this limitation in the terahertz (THz) frequency range. It combines a specially engineered metasurface with a deep learning algorithm to simultaneously and continuously measure the intensity, full polarization state, and frequency of incident light across a broad spectrum from 0.3 to 1.1 THz. Novelty The device’s innovation lies in its method…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA