Harnessing protein language model for structure-based discovery of highly efficient and robust PET hydrolases

Title

AI-Driven Discovery of Efficient PET Hydrolases

One-Sentence Summary

This study introduces a computational pipeline using a protein language model and structure-based search to discover a novel, highly efficient, and thermostable PET hydrolase from nature.

Overview

Polyethylene terephthalate (PET) plastic waste poses a significant environmental problem. While some enzymes, known as PET hydrolases (PETases), can break down PET, their performance is often limited. This research introduces VenusMine, a computational pipeline designed to discover new and more effective PETases. The process began by using the known structure of an existing enzyme, IsPETase, as a template to search for structurally similar proteins from vast biological databases. A protein language model (PLM) was then used to screen these candidates, predicting their solubility and thermal stability. This screening narrowed the field to 34 promising proteins for laboratory testing. Biochemical validation confirmed that 14 of these candidates could degrade PET. One enzyme, named KbPETase, isolated from the bacterium Kibdelosporangium banguiense, was particularly effective. It showed a melting temperature 32°C higher than IsPETase and exhibited a 97-fold increase in degradation activity at 50°C compared to IsPETase at 30°C. Structural analysis confirmed that KbPETase has a stable and efficient architecture.

Novelty

The approach of this study differs from conventional enzyme discovery methods, which typically rely on sequence similarity. The VenusMine pipeline instead integrates a structure-based search with the predictive power of a protein language model. This combination allows for the identification of functionally related enzymes even when their amino acid sequences are not very similar; for instance, KbPETase shares only 30-50% sequence identity with previously known PETases. Using a PLM to pre-screen candidates for key properties like thermostability and solubility is also a key feature. This computational pre-screening significantly improves the efficiency of the discovery process by reducing the number of candidates that require expensive and time-consuming experimental validation, increasing the success rate of finding active enzymes.

My Perspective

In my view, this work illustrates a shift in how we can explore the vast, untapped library of proteins that nature has produced. The use of a protein language model goes beyond simple pattern matching; it leverages an AI that has learned some of the underlying “grammatical rules” of protein biology, connecting amino acid sequences to their structural and functional outcomes. This allows us to perform a kind of computational “bioprospecting” with greater precision. It is akin to searching for a specific type of tool in an immense warehouse, not by its color or brand (sequence), but by its functional shape (structure) and durability (predicted stability). This methodology enables the identification of enzymes that evolution has already optimized, which may serve as superior starting points for further engineering compared to well-studied but less robust enzymes.

Potential Clinical / Research Applications

The VenusMine pipeline itself is a versatile tool that can be adapted to search for other enzymes with industrial or therapeutic value. For research, it could be applied to discover enzymes for degrading other types of plastics, producing biofuels, or synthesizing pharmaceuticals. The discovered enzyme, KbPETase, has direct applications in industrial PET recycling. Its high thermal stability and catalytic efficiency make it a strong candidate for developing large-scale, energy-efficient plastic degradation processes. Furthermore, KbPETase provides a new molecular scaffold for protein engineers to use in directed evolution experiments, potentially leading to even more potent variants for bioremediation. While not directly clinical, the principles of using structure-based AI models to find robust proteins could be applied to the discovery or stabilization of therapeutic enzymes used in treating metabolic disorders.