EP-EDL is an accurate ensemble deep learning predictor using only protein sequence information for human essential protein prediction.
- torch == 1.4.0
- torchvision == 0.7.0
- numpy == 1.16.4
- scikit-learn == 0.23.2
- imblearn == 0.7.0
In this GitHub project, we give a demo to show how EP-EDL works.
In the data folder, we provide
- human_protein_sequence_remove_redundancy.xlsx contains the raw protein sequences and their labels. You can use them for other sequence-based essential protein prediction.
- train_test_dset.pkl are used to store the training and test sets which are generated by random seed of
19980530
. The random seed is used to ensure that you can produce the same results as in the paper. - Beside, the processed protein sequence features, i.e., pssm.pkl, can be downloaded at https://pan.baidu.com/s/1kda3AD8EHW7y0Xu2bc4jQA (access code: i166 ).
In our demo, we provide a python file (main.py) to train and evaluate the ensemble predictor. You can train the model with a very simple way by using the command below:
python main.py
In the saved_models folder, we provide 17 trained base models. The models are trained using GPU.
The other details can see the paper and the codes.
If any questions, please do not hesitate to contact me at:
Yiming Li [email protected]
Min Li [email protected]