Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate

  • Jiahe Du
  • , Kaixiong Zhou
  • , Xinyu Hong
  • , Zhaozhuo Xu
  • , Jinbo Xu
  • , Xiao Huang

Research output: Contribution to journalConference articlepeer-review

Abstract

Generating novel enzymes for target molecules in zero-shot scenarios is a fundamental challenge in biomaterial synthesis and chemical production. Without known enzymes for a target molecule, training generative models becomes difficult due to the lack of direct supervision. To address this, we propose a retrieval-augmented generation method that uses existing enzyme-substrate data to guide enzyme design. Our method retrieves enzymes with substrates that share structural similarities with the target molecule, leveraging functional similarities in catalytic activity. Since none of the retrieved enzymes directly catalyze the target molecule, we use a conditioned discrete diffusion model to generate new enzymes based on the retrieved examples. An enzyme-substrate relationship classifier guides the generation process to ensure optimal protein sequence distributions. We evaluate our model on enzyme design tasks with diverse real-world substrates and show that it outperforms existing protein generation methods in catalytic capability, foldability, and docking accuracy. Additionally, we define the zero-shot substrate-specified enzyme generation task and introduce a dataset with evaluation benchmarks.

Original languageEnglish
Pages (from-to)14719-14734
Number of pages16
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Fingerprint

Dive into the research topics of 'Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate'. Together they form a unique fingerprint.

Cite this