Solvent

Solvent is a library that provides protein folding algorithms. It supports single sequence based protein folding including ESMFold, OmegaFold, and IgFold. Researchers can train and evaluate each model with same conditions and design new model variant by combining modules.

Installation

See installation instructions

Data preparation

See data preparation

Download pretrained language models

See download pretrained PLMs

Use cases

Training ESMFold on single GPU

# initial training
python train_net.py \
    --config-file configs/esm35_evo1_initial_pdbonly.yaml \
    --num-gpus 1 SOLVER.SEQ_PER_BATCH 2 \
    OUTPUT_DIR output/esm35_evo1/initial_pdbonly

# finetuning from initially trained model
python train_net.py \
    --config-file configs/esm35_evo1_finetune_pdbonly.yaml \
    --num-gpus 1 SOLVER.SEQ_PER_BATCH 2 \
    OUTPUT_DIR output/esm35_evo1/finetune_pdbonly \
    MODEL.WEIGHTS output/esm35_evo1/initial_pdbonly/model_final.pth

Training models using DDP

# e.g. 16 batch with 2 machines(8GPU)
# (machine 0)
python train_net.py \
    --config-file configs/esm35_evo1_initial_pdbonly.yaml \
    --num-gpus 8 --num-machines 2 --machine-rank 0 --dist-url <URL> \
    SOLVER.SEQ_PER_BATCH 16 \
    OUTPUT_DIR output/esm35_evo1/initial_pdbonly

# (machine 1)
python train_net.py \
    --config-file configs/esm35_evo1_initial_pdbonly.yaml \
    --num-gpus 8 --num-machines 2 --machine-rank 1 --dist-url <URL> \
    SOLVER.SEQ_PER_BATCH 16 \
    OUTPUT_DIR output/esm35_evo1/initial_pdbonly

Evaluation on trained model

python train_net.py \
    --eval-only \
    --config-file output/esm35_evo1/finetune_pdbonly/config.yaml \
    --num-gpus 1 \
    MODEL.WEIGHTS output/esm35_evo1/finetune_pdbonly/model_final.pth

Inference from fasta

python demo/demo.py \
    --config-file output/esm35_evo1/finetune_pdbonly/config.yaml \
    --input datasets/cameo/fasta_dir/* \
    --output output/esm35_evo1/finetune_pdbonly/results \
    --opt \
    SOLVER.SEQ_PER_BATCH 1 \
    MODEL.WEIGHTS output/esm35_evo1/finetune_pdbonly/model_final.pth

References

This repository is heavily depend on the project listed below.

To make Solvent working as framework, we refer the pipeline of Detectron2. We represent individual method using the implementation of AlphaFold2, OpenFold, IgFold, and OmegaFold.

Acknowledgements

We acknowledge the contributions of the Language Model Engineering Team at Kakao Brain, who have optimized Solvent. These optimizations make Solvent efficient in training speed and memory, so researchers can easily tap larger models. Their support has been essential in achieving the outcomes presented in this work.

Citation

The description of Solvent is in the technical report below.

@misc{lee2023solvent,
      title={Solvent: A Framework for Protein Folding}, 
      author={Jaemyung Lee and Kyeongtak Han and Jaehoon Kim and Hasun Yu and Youhan Lee},
      year={2023},
      eprint={2307.04603},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
configs		configs
datasets		datasets
demo		demo
pretrained_model		pretrained_model
solvent		solvent
tools		tools
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENCE		LICENCE
NOTICE		NOTICE
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solvent

Installation

Data preparation

Download pretrained language models

Use cases

References

Acknowledgements

Citation

About

Releases

Packages

Languages

License

kakaobrain/solvent

Folders and files

Latest commit

History

Repository files navigation

Solvent

Installation

Data preparation

Download pretrained language models

Use cases

References

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages