Monomerizer

Monomerizer (or SMILES2Seq, #SMILES2FASTA) is a pipeline that converts peptides and peptidomimetics, represented as SMILES (chemical formulae), into sequences of amino acids and terminal modifications.

For more information, visit our paper: Coming soon?🙏.

To use the output data to finetune our foundation language model for peptidomimetics, visit: GPepT

Usage

To run a Monomerizer demo, use the following command:

python3 run_pipeline.py --input_file demo/example_smiles.txt

By default, results will be saved to the output/<datetime> directory. The raw directory contains the raw result, and the standard directory contains the sequences after standardizing them to the standard dictionary accepted by GPepT.
Replace demo/example_smiles.txt with the path to your input file containing SMILES strings. (The input file must follow the format of the example files in the demo directory.)

Optional arguments

--output_dir <path>
--min_amino_acids <int>: Minimum number of amino acids required for processing. Default is 3.
--batch_size <int>: Number of SMILES to process in each batch. Default is 100.
--max_workers <int>: Maximum number of parallel workers. Default is the number of available CPU cores.
-draw: Draws output file like this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Monomerizer

Usage

Optional arguments

Files

README.md

Latest commit

History

README.md

File metadata and controls

Monomerizer

Usage

Optional arguments