MECE

a method for enhancing the catalytic efficiency of glycoside hydrolase based on deep neural networks and molecular evolution

A lack of effective prediction tools has limited development of high efficiency glycoside hydrolases (GH), which are in high demand for numerous industrial applications. This proof-of-concept study demonstrates the use of a deep neural network and molecular evolution (MECE) platform for predicting catalysis-enhancing mutations in GHs. The MECE platform integrates a deep learning model (DeepGH), trained with 119 GH family protein sequences from the CAZy database. MECE also includes a quantitative mutation design component that uses Gradient-weighted Class Activation Mapping (Grad-CAM) with homologous protein sequences to identify key features for mutation in the target GH, this component can be used in this page.

Requirements:

python 2 or python 3
tensorflow == 1.15.0
keras == 2.3.1
numpy == 1.19.0
opencv-python == 3.3.1
matplotlib == 2.3.5
scikit-learn == 0.19.2

USE MECE

You can use MECE online or download all of the codes to run MECE in local.

Online version:

PirD MECE

Use in local:

The code MECE.py by the following script in console, the ten-fold models are saved in Zenodo

MECE Model 2022
MECE Model 2023-735
MECE Model 2023-2000

python MECE.py -data_url <fasta file dir> -data_url <outpot folder dir>

Visualization:

When you finish run the <mece.py> or get zip file from PirD MECE, a csv file will be generated, and also plot the weight in the same dir.
You can use plot_logo.r to plot motif figure or you can use <Chimera - define attribute> to plot 2D structure with weight.
An example result file for plot motif and 2D sturcture have been saved in example, the function for generate these files also in MECE.py
For plot 2d structure, you must download UCSF Chimera or UCSF ChimeraX.

EXAMPLE:

Train your own Deep-GH

Get sequences
About the glycoside hydrolases, of which there are 174 families in the CAZy database, including the unclassified sequences(GH0), and 10 families that contain no reference sequences. For the remaining 164 families, ypu can obtained the corresponding GenBank numbers through the CAZy website and downloaded the corresponding amino acid sequences using the Batch Entrez port and Biopython toolkit provided by the National Center for Biotechnology Information database (NCBI).
Download our dataset
Or our fasta format dataset are supported in our website, you can use process_dataset.py and process_dataset_1.py to convert it to the train/val/test format datasrt.
1. The process_dataset.py is for select GH Family which have more than 10 sequences
2. the process_dataset_1.py is for generate 10-fold dataset, split dataset to Train/Val/Test dataset and convert 20 residues to number 1-20.
Train your own model
The code is keras_RNN_train_gpu.py in train_models
Note: If you want to replicate our work, try the dataset on our zenodo page:

Sum weights after Water

Firstly, you should use the Water tool for perform a pairwise local alignment of each sequence homologous to the wild type.
Then, based on the results of sequence alignment, the functionally relevant evolutionary feature matrix (Me) of the wild type was obtained by summing all sequence feature matrices using the wild type as the standard.
Then, The difference between the mutant site Pj,max with the highest importance score in each row of the Me and the wild-type site Pj,wt was compared, and the ploidy relationship (Fj) between Pj,max and Pj,wt was calculated using the division method. -Finally, The sites with Fj ≥ 20-fold were selected as single point mutants. The loci with Fj ≥ 20-fold were selected according to the ploidy size to design a multipoint mutant.
For the process, scripts, and demo files, please visit process_water.

References

MECE: a method for enhancing the catalytic efficiency of glycoside hydrolase based on deep neural networks and molecular evolution

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
data		data
example		example
plots		plots
process_water		process_water
train_models		train_models
.gitignore		.gitignore
LICENSE		LICENSE
MECE.py		MECE.py
README.md		README.md
plot_motif.R		plot_motif.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MECE

a method for enhancing the catalytic efficiency of glycoside hydrolase based on deep neural networks and molecular evolution

Requirements:

USE MECE

Online version:

Use in local:

Visualization:

EXAMPLE:

Train your own Deep-GH

Sum weights after Water

References

About

Uh oh!

Releases

Packages

Languages

License

BRITian/MECE

Folders and files

Latest commit

History

Repository files navigation

MECE

a method for enhancing the catalytic efficiency of glycoside hydrolase based on deep neural networks and molecular evolution

Requirements:

USE MECE

Online version:

Use in local:

Visualization:

EXAMPLE:

Train your own Deep-GH

Sum weights after Water

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages