Skip to content

Latest commit

 

History

History
90 lines (69 loc) · 6.16 KB

README.md

File metadata and controls

90 lines (69 loc) · 6.16 KB

Perceiver CPI (Version 1.0)

A Pytorch Implementation of paper:

PerceiverCPI: A nested cross-attention network for compound-protein interaction prediction

Ngoc-Quang Nguyen , Gwanghoon Jang , Hajung Kim and Jaewoo Kang

Our reposistory uses https://github.com/chemprop/chemprop as a backbone for compound information extraction. We highly recommend researchers read the paper D-MPNN to better understand how it was used.

Motivation: Compound-protein interaction (CPI) plays an essential role in drug discovery and is performed via expensive molecular docking simulations. Many artificial intelligence-based approaches have been proposed in this regard. Recently, two types of models have accomplished promising results in exploiting molecular information: graph convolutional neural networks that construct a learned molecular representation from a graph structure (atoms and bonds), and neural networks that can be applied to compute on descriptors or fingerprints of molecules. However, the superiority of one method over the other is yet to be determined. Modern studies have endeavored to aggregate information that is extracted from compounds and proteins to form the CPI task. Nonetheless, these approaches have used a simple concatenation to combine them, which cannot fully capture the interaction between such information.

Results: We propose the Perceiver CPI network, which adopts a cross-attention mechanism to improve the learning ability of the representation of drug and target interactions and exploits the rich information obtained from extended-connectivity fingerprints to improve the performance. We evaluated Perceiver CPI on three main datasets, Davis, KIBA, and Metz, to compare the performance of our proposed model with that of state-of-the-art methods. The proposed method achieved satisfactory performance and exhibited significant improvements over previous approaches in all experiments

0.Overview of Perceiver CPI

image

Set up the environment:

In our experiment we use, Python 3.9 with PyTorch 1.7.1 + CUDA 10.1.

git clone https://github.com/dmis-lab/PerceiverCPI.git
conda env create -f environment.yml

1.Dataset and supplementary experiments

benchmark_data_vis

image

The data should be in the format csv: 'smiles','sequences','label'!

2.To train the model:

python train.py --data_path "datasetpath" --separate_val_path "validationpath" --separate_test_path "testpath" --metric mse --dataset_type regression --save_dir "checkpointpath" --target_columns label

Usage Example:

python train.py --data_path ./toy_dataset/novel_pair_0_train.csv --separate_val_path ./toy_dataset/novel_pair_0_val.csv --separate_test_path ./toy_dataset/novel_pair_0_test.csv --metric mse --dataset_type regression --save_dir regression_150_newprot_pre --target_columns label --epochs 150 --ensemble_size 3 --num_folds 1 --batch_size 50 --aggregation mean --dropout 0.1 --save_preds

3.To take the inferrence:

python predict.py --test_path "testdatapath" --checkpoint_dir "checkpointpath" --preds_path "predictionpath.csv"

Usage Example:

python predict.py --test_path ./toy_dataset/novel_pair_0_test.csv --checkpoint_dir regression_150_newprot_pre --preds_path newnew_fold0.csv

4.To train YOUR model:

Your data should be in the format csv, and the column names are: 'smiles','sequences','label'.

You can freely tune the hyperparameter for your best performance (but highly recommend using the Bayesian optimization package).

5.Citations

If you find the models useful in your research, please consider citing the paper:

@article{10.1093/bioinformatics/btac731,
    author = {Nguyen, Ngoc-Quang and Jang, Gwanghoon and Kim, Hajung and Kang, Jaewoo},
    title = "{Perceiver CPI: A nested cross-attention network for compound-protein interaction prediction}",
    journal = {Bioinformatics},
    year = {2022},
    month = {11},
    abstract = "{Compound-protein interaction (CPI) plays an essential role in drug discovery and is performed via expensive molecular docking simulations. Many artificial intelligence-based approaches have been proposed in this regard. Recently, two types of models have accomplished promising results in exploiting molecular information: graph convolutional neural networks that construct a learned molecular representation from a graph structure (atoms and bonds), and neural networks that can be applied to compute on descriptors or fingerprints of molecules. However, the superiority of one method over the other is yet to be determined. Modern studies have endeavored to aggregate information that is extracted from compounds and proteins to form the CPI task. Nonetheless, these approaches have used a simple concatenation to combine them, which cannot fully capture the interaction between such information.We propose the Perceiver CPI network, which adopts a cross-attention mechanism to improve the learning ability of the representation of drug and target interactions and exploits the rich information obtained from extended-connectivity fingerprints to improve the performance. We evaluated Perceiver CPI on three main datasets, Davis, KIBA, and Metz, to compare the performance of our proposed model with that of state-of-the-art methods. The proposed method achieved satisfactory performance and exhibited significant improvements over previous approaches in all experiments.Perceiver CPI is available at https://github.com/dmis-lab/PerceiverCPISupplementary data are available at Bioinformatics online.}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac731},
    url = {https://doi.org/10.1093/bioinformatics/btac731},
    note = {btac731},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac731/47214739/btac731.pdf},
}