GLN Matrix Prediction

This is the code for data preprocessing, model training and evaluation described in the paper "Deep Learning-Assisted Discovery of Protein Entangling Motifs".

Installation

To run our codes, you will need:

Python 3.8 or 3.9.
NVIDIA CUDA 11.3 or above
PyTorch 1.12 or above

Lines below would create a new conda environment called "gln_matrix":

git clone https://github.com/daniel-dpq/gln_matrix.git
cd gln_matrix
conda env create --name=gln_matrix -f environment.yml
conda activate gln_matrix

You will need to download Uniclust30 database (~ 86G) for MSA searching

Alternatively, you can follow the instructions at OpenFold Page to set up your environment.

Parameters

Parameter files you may use for your own trial are deposited in params/. params_model_1_multimer_v2.npz is the original AlphaFold2 parameters we used in our paper. fine-tuned.pt and finalpoint.pt are the final checkpoint and the checkpoint with the minimum validation loss during our finetuning. Data reported in our paper was based on fine-tuned.pt. Pass --param_path in predict.sh to specify .pt parameter path and pass --alphafold_param_path to sepcify the original AlphaFold2 parameter path.

Data

Data used for our training, validation and test are deposited in data/. Recent PDB test set and the Monomer test set are also provided.

Input file

To predict the GLN matrix of a homodimer, you need to prepare a fasta file containing two identical sequences with different sequence ids. Examples are given in the example/ directory.

Inference

we provide a bash script predict.sh and some sequence examples in example/ for model inference. Please change the uniclust30_database_path before running ./predict.sh

Output

predict.py gives multi-sequence alignments, prediction timings and generated features for prediction in the output directory. The predicted GLN matrices were given in both .txt and .png formats. Example outputs are given in out/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLN Matrix Prediction

Installation

Parameters

Data

Input file

Inference

Output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
alphafold		alphafold
data		data
example		example
out		out
params		params
.gitattributes		.gitattributes
README.md		README.md
environment.yml		environment.yml
predict.py		predict.py
predict.sh		predict.sh

daniel-dpq/gln_matrix

Folders and files

Latest commit

History

Repository files navigation

GLN Matrix Prediction

Installation

Parameters

Data

Input file

Inference

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages