Harnessing protein language model for structure-based discovery of highly efficient and robust PET hydrolases

Title

AI-Driven Discovery of Efficient PET Hydrolases

One-Sentence Summary

This study introduces a computational pipeline using a protein language model and structure-based search to discover a novel, highly efficient, and thermostable PET hydrolase from nature.

Overview

Polyethylene terephthalate (PET) plastic waste poses a significant environmental problem. While some enzymes, known as PET hydrolases (PETases), can break down PET, their performance is often limited. This research introduces VenusMine, a computational pipeline designed to discover new and more effective PETases. The process began by using the known structure of an existing enzyme, IsPETase, as a template to search for structurally similar proteins from vast biological databases. A protein language model (PLM) was then used to screen these candidates, predicting their solubility and thermal stability. This screening narrowed the field to 34 promising proteins for laboratory testing. Biochemical validation confirmed that 14 of these candidates could degrade PET. One enzyme, named KbPETase, isolated from the bacterium Kibdelosporangium banguiense, was particularly effective. It showed a melting temperature 32°C higher than IsPETase and exhibited a 97-fold increase in degradation activity at 50°C compared to IsPETase at 30°C. Structural analysis confirmed that KbPETase has a stable and efficient architecture.

Novelty

The approach of this study differs from conventional enzyme discovery methods, which typically rely on sequence similarity. The VenusMine pipeline instead integrates a structure-based search with the predictive power of a protein language model. This combination allows for the identification of functionally related enzymes even when their amino acid sequences are not very similar; for instance, KbPETase shares only 30-50% sequence identity with previously known PETases. Using a PLM to pre-screen candidates for key properties like thermostability and solubility is also a key feature. This computational pre-screening significantly improves the efficiency of the discovery process by reducing the number of candidates that require expensive and time-consuming experimental validation, increasing the success rate of finding active enzymes.

My Perspective

In my view, this work illustrates a shift in how we can explore the vast, untapped library of proteins that nature has produced. The use of a protein language model goes beyond simple pattern matching; it leverages an AI that has learned some of the underlying “grammatical rules” of protein biology, connecting amino acid sequences to their structural and functional outcomes. This allows us to perform a kind of computational “bioprospecting” with greater precision. It is akin to searching for a specific type of tool in an immense warehouse, not by its color or brand (sequence), but by its functional shape (structure) and durability (predicted stability). This methodology enables the identification of enzymes that evolution has already optimized, which may serve as superior starting points for further engineering compared to well-studied but less robust enzymes.

Potential Clinical / Research Applications

The VenusMine pipeline itself is a versatile tool that can be adapted to search for other enzymes with industrial or therapeutic value. For research, it could be applied to discover enzymes for degrading other types of plastics, producing biofuels, or synthesizing pharmaceuticals. The discovered enzyme, KbPETase, has direct applications in industrial PET recycling. Its high thermal stability and catalytic efficiency make it a strong candidate for developing large-scale, energy-efficient plastic degradation processes. Furthermore, KbPETase provides a new molecular scaffold for protein engineers to use in directed evolution experiments, potentially leading to even more potent variants for bioremediation. While not directly clinical, the principles of using structure-based AI models to find robust proteins could be applied to the discovery or stabilization of therapeutic enzymes used in treating metabolic disorders.

Similar Posts

  • KT-LLM: An Auditable Framework for Kidney Transplant Care

    Original Title: KT-LLM: an evidence-grounded and sequence text framework for auditable kidney transplant modeling Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02323-5 Overview The management of kidney transplantation involves complex longitudinal data and strict regulatory policies that are often difficult to align. This study presents KT-LLM, a framework designed to bridge the gap between structured patient follow-up data and the textual rules governing clinical practice. The system uses a modular architecture consisting of three specialized agents coordinated by a large language model. Agent-A, utilizing a Mamba-based sequence model, predicts survival and graft loss outcomes. Agent-B identifies distinct patient subgroups through deep embedded clustering, while Agent-C translates policy documents into executable rules to…

  • Ensuring Health Equity in the Medical AI Revolution

    Original Title: Keeping Health Equity at the Forefront of the Artificial Intelligence Revolution in Medicine and Health Journal: JAMA health forum DOI: 10.1001/jamahealthforum.2025.6477 Overview OverviewThe rapid deployment of artificial intelligence in healthcare offers potential for increased efficiency and improved health outcomes. However, significant concerns exist regarding its impact on health equity. Historically, technological innovations have often benefited advantaged populations first, a phenomenon known as the 'inverse equity hypothesis'. Evidence from studies across 89 low- and middle-income countries demonstrates that without deliberate strategies, new technologies widen existing health gaps. Digital health tools frequently sustain inequities related to socioeconomic status, race, and geographic location. For instance, individuals with lower socioeconomic status are…

  • Non-coding genetic elements of lung cancer identified using whole genome sequencing in 13,722 Chinese

    Title Lung Cancer’s Non-Coding Genetic Drivers One-Sentence Summary A whole-genome sequencing study of 13,722 Chinese individuals identifies common and rare non-coding genetic variants associated with lung cancer, implicating novel genes and regulatory pathways. Overview This study investigated the genetic basis of lung cancer in the Chinese population, focusing on non-coding regions of the genome that regulate gene activity. Researchers performed whole-genome sequencing on 13,722 individuals and analyzed both common and rare genetic variants. For common variants, the analysis confirmed associations with known genes like TP63 and, through a transcriptome-wide association study (TWAS), linked the expression of eight genes to lung cancer risk. The analysis of rare variants, which are less…

  • AI-Powered Telephone Cognitive Rehabilitation for Dementia

    Original Title: Dementia Care Research and Psychosocial Factors Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70858_098721 Overview Cognitive impairment is a significant global health challenge, affecting approximately 32% of adults over the age of 65. While cognitive rehabilitation is an established method to maintain independence and delay the need for institutional care, access remains restricted by a shortage of specialized providers. This study evaluates an automated therapy delivery system developed by Moneta Health, which utilizes an artificial intelligence-powered voice agent to provide personalized cognitive activities via telephone. The program focuses on stimulating cognitive deficits and teaching compensatory strategies. A cohort of 75 participants, with an…

  • Volumetric Brain Matter Changes in Mild Cognitive Impairment

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_106355 Overview Mild cognitive impairment (MCI) serves as a critical transitional stage between the typical cognitive changes of aging and the onset of Alzheimer's disease. This study explores structural brain alterations associated with this condition by quantifying gray matter and white matter volumes using high-resolution T1-weighted magnetic resonance imaging. The research team utilized a specialized deep neural network named Vb-Net to perform automated segmentation and volumetric analysis on healthy controls and individuals with MCI. Patients with MCI experienced a 4.60% reduction in gray matter volume and a 5.60% decrease in white matter volume compared to…

  • Expert Consensus on Sonazoid CEUS for Liver Lesions

    Original Title: Expert consensus regarding the clinical application of liver contrast-enhanced US with Sonazoid (Sonazoid CEUS) Journal: International journal of surgery (London, England) DOI: 10.1097/JS9.0000000000003510 Overview This document presents an expert consensus on the clinical use of Sonazoid contrast-enhanced ultrasound for managing focal liver lesions. Sonazoid is a second-generation agent that functions as both a blood pool and a Kupffer-cell agent, with a phagocytic rate of 99 percent. Unlike pure blood-pool agents, it provides a stable post-vascular phase that lasts for approximately sixty minutes, enabling thorough liver scans. The consensus covers surveillance, diagnosis of hepatocellular carcinoma, detection of metastases, and interventional guidance. In high-risk patients, Sonazoid improves the detection of…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA