diff --git a/README.md b/README.md index 871b5b0b..1d41fc6a 100644 --- a/README.md +++ b/README.md @@ -1,286 +1,67 @@ -# Meeko: preparation of small molecules for AutoDock +# Meeko: interface for AutoDock [![API stability](https://img.shields.io/badge/stable%20API-no-orange)](https://shields.io/) [![PyPI version fury.io](https://img.shields.io/badge/version-0.6.0-green.svg)](https://pypi.python.org/pypi/meeko/) [![Documentation Status](https://readthedocs.org/projects/meeko/badge/?version=readthedocs)](https://meeko.readthedocs.io/en/readthedocs/?badge=readthedocs) -Meeko reads an RDKit molecule object and writes a PDBQT string (or file) -for [AutoDock-Vina](https://github.com/ccsb-scripps/AutoDock-Vina) -and [AutoDock-GPU](https://github.com/ccsb-scripps/AutoDock-GPU). -It converts the docking output to RDKit molecules and -SD files, without loss of bond orders. +Meeko prepares the input for AutoDock and processes its output. +It is developed alongside AutoDock-GPU and AutoDock-Vina. +Meeko parameterizes both small organic molecules (ligands) and proteins +and nucleic acids (receptors). Meeko is developed by the [Forli lab](https://forlilab.org/) at the [Center for Computational Structural Biology (CCSB)](https://ccsb.scripps.edu) at [Scripps Research](https://www.scripps.edu/). -## Usage notes +## Documentation -Meeko does not calculate 3D coordinates or assign protonation states. -Input molecules must have explicit hydrogens. +The docs are hosted on [meeko.readthedocs.io](meeko.readthedocs.io) -Sampling of macrocycle conformers ([paper](https://doi.org/10.1017/qrd.2022.18)) -is enabled by default. -SDF format strongly preferred over Mol2. -See -[this discussion](https://github.com/rdkit/rdkit/discussions/3647), and -[this one](https://sourceforge.net/p/rdkit/mailman/message/37668451/), -[also this](rdkit/rdkit#4061), -[and this](https://sourceforge.net/p/rdkit/mailman/message/37374678/). -and RDKit issues -[1755](https://github.com/rdkit/rdkit/issues/1755) and -[917](https://github.com/rdkit/rdkit/issues/917). So, what could go wrong? -For example, reading Mol2 files from ZINC -[led to incorrect net charge of some molecules.](https://github.com/forlilab/Meeko/issues/63) +## Reporting bugs +Please check if a similar bug has been reported and, if not, [open an issue](https://github.com/forlilab/Meeko/issues). -## v0.6.0-alpha.3 -This release aims to distribute an enhanced `mk_prepare_receptor.py`. -Some features are still being developed, hence the `-alpha` suffix in the version. -Reactive docking is not working in v0.6.0-alpha, but should be restored soon. -Documentation is also work in progress. +## Installation +Visit the docs for a more complete description. One option is conda or mamba: -## API changes in v0.5 - -Class `MoleculePreparation` no longer has method `write_pdbqt_string()`. -Instead, `MoleculePreparation.prepare()` returns a list of `MoleculeSetup` objects -that must be passed, individually, to `PDBQTWriterLegacy.write_string()`. -```python -from meeko import MoleculePreparation -from meeko import PDBQTWriterLegacy - -preparator = MoleculePreparation() -mol_setups = preparator.prepare(rdkit_molecule_3D_with_Hs) -for setup in mol_setups: - pdbqt_string, is_ok, error_msg = PDBQTWriterLegacy.write_string(setup) - if is_ok: - print(pdbqt_string, end="") -``` - -Argument `keep_nonpolar_hydrogens` is replaced by `merge_these_atom_types`, both in the Python -interface and for script `mk_prepare_ligand.py`. -The default is `merge_these_atom_types=("H",)`, which -merges hydrogens typed `"H"`, keeping the current default behavior. -To keep all hydrogens, set `merge_these_atom_types` to an empty -list when initializing `MoleculePreparation`, or pass no atom types -to `--merge_these_atom_types` from the command line: -```sh -mk_prepare_ligand.py -i molecule.sdf --merge_these_atom_types -``` - -## Dependencies - -* Python (>=3.5) -* Numpy -* Scipy -* RDKit (>=2023.09) -* ProDy (optionally, for covalent docking) - -Conda or Miniconda can install the dependencies: ```bash -conda install -c conda-forge numpy scipy rdkit -pip install prody # optional. pip recommended at http://prody.csb.pitt.edu/downloads/ -``` - - -## Python tutorial - - -#### 2. RDKit molecule from docking results - -```python -from meeko import PDBQTMolecule -from meeko import RDKitMolCreate - -fn = "autodock-gpu_results.dlg" -pdbqt_mol = PDBQTMolecule.from_file(fn, is_dlg=True, skip_typing=True) -rdkitmol_list = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol) -``` -The length of `rdkitmol_list` is one if there are no sidechains and only one -ligand was docked. -If multiple ligands and/or sidechains are docked simultaneously, each will be -an individual RDKit molecule in `rdkitmol_list`. -Sidechains are truncated at the C-alpha. -Note that docking multiple -ligands simultaneously is only available in Vina, and it differs from docking -multiple ligands one after the other. Each failed creation of an RDKit molecule -for a ligand or sidechain results in a `None` in `rdkitmol_list`. - -For Vina's output PDBQT files, omit `is_dlg=True`. -```python -pdbqt_mol = PDBQTMolecule.from_file("vina_results.pdbqt", skip_typing=True) -rdkitmol_list = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol) -``` - -When using Vina from Python, the output string can be passed directly. -See [the docs](https://autodock-vina.readthedocs.io/en/latest/docking_python.html) -for context on the `v` object. -```python -vina_output_string = v.poses() -pdbqt_mol = PDBQTMolecule(vina_output_string, is_dlg=True, skip_typing=True) -rdkitmol_list = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol) +micromamba install meeko ``` -#### 3. Initializing MoleculePreparation from a dictionary (or JSON) - -This is useful for saving and loading configuration files with json. -```python -import json -from meeko import MoleculePreparation - -mk_config = {"hydrate": True} # any arguments of MoleculePreparation.__init__ -print(json.dumps(mk_config), file=open('mk_config.json', 'w')) -with open('mk_config.json') as f: - mk_config = json.load(f) -preparator = MoleculePreparation.from_config(mk_config) -``` -The command line tool `mk_prepare_ligand.py` can read the json files using -option `-c` or `--config`. - - -## Possibly useful configurations of MoleculePreparation - -Here we create an instance of MoleculePreparation that attaches pseudo -waters to the ligand ([see paper on hydrated docking](https://pubs.acs.org/doi/abs/10.1021/jm2005145)), -keeps macrocycles rigid (generally a bad idea), -and keeps conjugated bonds and amide bonds rigid. -By default, most amides are kept rigid, except for tertiary amides with -different substituents on the nitrogen. +or from PyPI: -```python -preparator = MoleculePreparation( - hydrate=True, - rigid_macrocycles=True, - rigidify_bonds_smarts = ["C=CC=C", "[CX3](=O)[NX3]"], - rigidify_bonds_indices = [(1, 2), (0, 2)], -) -``` - -The same can be done with the command line script. Note that indices of the -atoms in the SMARTS are 0-based for the Python API but -1-based for the command line script: -```console -mk_prepare_ligand.py\ - --hydrate\ - --rigid_macrocycles\ - --rigidify_bonds_smarts "C=CC=C"\ - --rigidify_bonds_indices 2 3\ - --rigidify_bonds_smarts "[CX3](=O)[NX3]"\ - --rigidify_bonds_indices 1 3 +```bash +pip install meeko ``` -## Docking covalent ligands as flexible sidechains +## Usage -The input ligand must be the product of the reaction and contain the -atoms of the flexible sidechain up to (and including) the C-alpha. -For example, if the ligand is an acrylamide (smiles: `C=CC(=O)N`) reacting -with a cysteine (sidechain smiles: `CCS`), then the input ligand for -Meeko must correspond to smiles `CCSCCC(=O)N`. +Meeko exposes a Python API to enable scripting. Here we share very minimal examples +using the command line scripts just to give context. +Please visit the [meeko.readthedocs.io](meeko.readthedocs.io) for more information. -Meeko will align the ligand atoms that match the C-alpha and C-beta of -the protein sidechain. Options `--tether_smarts` and `--tether_smarts_indices` -define these atoms. For a cysteine, `--tether_smarts "SCC"` and -`--tether_smarts_indices 3 2` would work, although it is safer to define -a more spefic SMARTS to avoid matching the ligand more than once. The first -index (3 in this example) defines the C-alpha, and the second index defines -the C-beta. - -For the example in this repository, which is based on PDB entry 7aeh, -the following options prepare the covalently bound ligand for tethered docking: -```console -cd example/covalent_docking - -mk_prepare_ligand.py\ - -i ligand_including_cys_sidechain.sdf\ - --receptor protein.pdb\ - --rec_residue ":CYS:6"\ - --tether_smarts "NC(=O)C(O)(C)SCC"\ - --tether_smarts_indices 9 8\ - -o prepared.pdbqt +Parameterizing a ligand and writing a PDBQT file: +```bash +mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt ``` -Prody is required for preparing ligands as flexible sidechains. -Often, `pip install prody` suffices to install Prody in the currently -active virtual environment (e.g., conda). For more detailed installation -instructions visit http://prody.csb.pitt.edu/downloads/. +Parameterizing a receptor with a flexible sidechain and writing a PDBQT file +as well as a JSON file that stores the entire receptor datastructure. In this +example, the `-o` option sets the output base name, `-j` triggers writing the +.json file, `-p` triggers writting the .pdbqt file, and `-f` makes residue +42 in chain A flexible. -## Reactive Docking - -### 1. Prepare protein with waterkit -Follow `wk_prepare_receptor.py` instructions and run with `--pdb`. -The goal of this step is to perform essential fixes to the protein -(such as missing atoms), to add hydrogens, and to follow the Amber -naming scheme for atoms and residues, e.g., `HIE` or `HID` -instead of `HIS`. - -### 2. Prepare protein pdbqt -Here, `wk.pdb` was written by waterkit. The example below will center a gridbox of specified size on the given reactive residue. - -```console - $ mk_prepare_receptor.py\ - --pdb wk.pdb\ - -o receptor.pdbqt\ - --flexres " :ARG:348"\ - --reactive_flexres " :SER:308" - --reactive_flexres " :SER:308"\ - --box_center_on_reactive_res\ - --box_size 40 40 40 # x y z (angstroms) -``` -A manual box center can be specified with `--box_center`. -Reactive parameters can also be modified: -```sh - --r_eq_12 R_EQ_12 r_eq for reactive atoms (1-2 interaction) - --eps_12 EPS_12 epsilon for reactive atoms (1-2 interaction) - --r_eq_13_scaling R_EQ_13_SCALING - r_eq scaling for 1-3 interaction across reactive atoms - --r_eq_14_scaling R_EQ_14_SCALING - r_eq scaling for 1-4 interaction across reactive atoms +```bash +mk_prepare_receptor.py -i nucleic_acid.cif -o my_receptor -j -p -f A:42 ``` -Receptor preparation can't handle heteroatoms for the moment. -Also nucleic acids, ions, and post-translational modifications (e.g. -phosphorilation) are not supported. Only the 20 standard amino acids -can be parsed, and it is required to have Amber atom names and -hydrogens. No atoms can be missing. - -### 3. Run autogrid +Finally, converting docking results to SDF for the ligand, and PDB for the +receptor with updated sidechain positions: -Make affinity maps for the `_rigid.pdbqt` part of the receptor. -Make affinity maps for the `_rigid.pdbqt` part of the receptor. `mk_prepare_receptor.py` will prepare the GPF for you. - -### 4. Write ligand PDBQT -mk_prepare_ligand.py -i sufex1.sdf --reactive_smarts "S(=O)(=O)F" --reactive_smarts_idx 1 -o sufex1.pdbqt\ - -### 5. Configure AD-GPU for reactive docking - -For reactive docking there are two options that need to be passed to AutoDock-GPU: -For reactive docking there are an additional option that needs to be passed to AutoDock-GPU: - ```console - --import_dpf - ``` - -The `--derivtype` option, if needed, was written by `mk_prepare_receptor.py` to a file suffixed with `.derivtype`. - -The filename to be passed to `--import_dpf` was written by `mk_prepare_receptor.py` -and it is suffixed with `reactive_config`. -```sh -ADGPU -I *.reactive_config -L sufex1.pdbqt -N sufex1_docked_ -F *_flex.pdbqt -C 1 +```bash +mk_export.py vina_results.pdbqt -j my_receptor.json -s lig_docked.sdf -p rec_docked.pdb ``` - -## Polymer docs temporary placeholder - -Current init is a builder method designed to parse PDB files and topology -objects. It is not designed to monomerize a polymer that already exists -as a single RDKit molecule, or to build a single polymer molecule from the -individual residues (monomers). To do that, one would have to react residues -with each other. Padding could be adapted. - - -How bonds between residues are handled and blunt residues, and deleting -bonds "interactively". - -Document residue\_chem\_params. diff --git a/docs/source/cli_export_result.rst b/docs/source/cli_export_result.rst deleted file mode 100644 index 641fbd21..00000000 --- a/docs/source/cli_export_result.rst +++ /dev/null @@ -1,43 +0,0 @@ -mk_export.py -============ - -Convert docking results to SDF ------------------------------- - -AutoDock-GPU and Vina write docking results in the PDBQT format. The DLG output -from AutoDock-GPU contains docked poses in PDBQT blocks, plus additional information. -Meeko generates RDKit molecules from PDBQT using the SMILES -string in the REMARK lines. The REMARK lines also have the mapping of atom indices -between SMILES and PDBQT. SD files with docked coordinates are written -from RDKit molecules. - -.. code-block:: bash - - mk_export.py molecule.pdbqt -o molecule.sdf - mk_export.py vina_results.pdbqt -o vina_results.sdf - mk_export.py autodock-gpu_results.dlg -o autodock-gpu_results.sdf - -Why this matters ----------------- - -Making RDKit molecules from SMILES is safer than guessing bond orders -from the coordinates, specially because the PDBQT lacks hydrogens bonded -to carbon. As an example, consider the following conversion, in which -OpenBabel adds an extra double bond, not because it has a bad algorithm, -but because this is a nearly impossible task. - -.. code-block:: bash - - obabel -:"C1C=CCO1" -o pdbqt --gen3d | obabel -i pdbqt -o smi - [C]1=[C][C]=[C]O1 - - -Caveats -------- - -If docking does not use explicit Hs, which it often does not, the -exported positions of hydrogens are calculated from RDKit. This can -be annoying if a careful forcefield minimization is employed before -docking, as probably rigorous Hs positions will be replaced by the -RDKit geometry rules, which are empirical and much simpler than most -force fields. diff --git a/docs/source/cli_lig_prep.rst b/docs/source/cli_lig_prep.rst deleted file mode 100644 index 91fe3224..00000000 --- a/docs/source/cli_lig_prep.rst +++ /dev/null @@ -1,16 +0,0 @@ -mk_prepare_ligand.py -==================== - -Command line tool to prepare small organic molecules. - -Write PDBQT files ------------------ - -AutoDock-GPU and Vina read molecules in the PDBQT format. These can be prepared -by Meeko from SD files, or from Mol2 files, but SDF is strongly preferred. - -.. code-block:: bash - - mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt - mk_prepare_ligand.py -i multi_mol.sdf --multimol_outdir folder_for_pdbqt_files - diff --git a/docs/source/cli_rec_prep.rst b/docs/source/cli_rec_prep.rst index 2fadd173..84944487 100644 --- a/docs/source/cli_rec_prep.rst +++ b/docs/source/cli_rec_prep.rst @@ -1,54 +1,26 @@ -The input structure is matched against templates to -guarantee chemical correctness and identify problems with the input structures. -This allows the user to identify and fix problems, resulting in a molecular -model that is correct with respect to heavy atoms, protonation state, -connectivity, bond orders, and formal charges. +mk_prepare_receptor.py +====================== -The matching algorithm uses the connectivity and elements, but not bond orders -or atom names. Hydrogens are optional. This makes it compatible with input -files from various sources. - -Templates are matched on a per residue basis. Each residue is represented -as an instance of a PolymerResidue object, which contains: - - an RDKit molecule that represents the actual state - - a padded RDKit molecule containing a few atoms from the adjacent residues - - parameters such as partial charges - -The positions are set by the input, and the connectivity and formal charges -are defined by the templates. Heavy atoms must match exactly. If heavy atoms -are missing or in excess, the templates will fail to match. - -Missing hydrogens are added by RDKit, but are not subjected to minimization -with a force field. Thus, their bond lengths are not super accurate. - -Different states of the same residue are stored as different templates, -for example different protonation states of HIS, N-term, LYN/LYS, etc. -Residue name is primary key unless user overrides. - -Currently not supported: capped residues from charmm-gui. - -mk_prepare_receptor -=================== +A command-line script for receptor preparation from a PDB/CIF file, setting up various properties (flexible/reactive residues, box specifications, etc.), and writing useful input files (PDBQT, GPF, box configuration file for Vina) for docking. Basic usage ----------- .. code-block:: bash - mk_prepare_receptor -i examples/system.pdb --write_pdbqt prepared.pdbqt - + mk_prepare_receptor.py -i examples/system.pdb --write_pdbqt prepared.pdbqt +This is equivalent to: +.. code-block:: bash -Protonation states ------------------- + mk_prepare_receptor.py -i examples/system.pdb -o prepared -p +Read more about the syntax in :ref:`Write flags` and :ref:`Options`. -Adding templates ----------------- Write flags ------------ +~~~~~~~~~~~ The option flags starting with ``--write`` in ``mk_prepare_receptor`` can be used both with an argument to specify the outpuf filename: @@ -67,38 +39,42 @@ It is also possible to combine the two types of usage: .. code-block:: bash - --output_basename myenzyme - --write_pdbqt - --write_json - --write_vina_box box_for_myenzyme.txt - -in which case the specified filenames have priority over the default basename. - -.. _templates: - -Templates ---------- - -The templates contain SMILES strings that are used to create the RDKit -molecules that constitute every residue in the processed model. In this way, -the chemistry of the processed model is fully defined by the templates, -and the only thing that is preserved from the input are the atom positions -and the connectivity between residues. - -The SMILES strings contain all atoms that exist in the final model, -and none that do not exist. This also applies to hydrogens, -meaning that the SMILES are expected to have real hydrogens. Note that -real hydrogens are different from explicit hydrogens. Real hydrogens will be -represented as an actual atom in an RDKit molecule, while explicit hydrogens -are a just property of heavy atoms. In the SMILES, real hydrogens are defined -with square brackets "[H]" and explicit hydrogens without, e.g. "[nH]" to set -the number of explicit hydrogens on an aromatic nitrogen to one. - -Residues that are part of a polymer, which is often all of them, will have -bonds to adjacent residues. The heavy atoms involved in the bonds will miss -a real hydrogen and have an implicit (or explicit) one instead. As an -example, consider modeling an alkyl chain as a polymer, in which the monomer -is a single carbon atom. Our template SMILES would be "[H]C[H]". The RDKit -molecule will have three atoms and the carbon will have two implicit hydrogens. -The implicit hydrogens correspond to bonds to adjacent residues in the -processed polymer. + --output_basename myenzyme --write_pdbqt --write_json --write_vina_box box_for_myenzyme.txt + +in which case the specified filenames have priority over the default basename. + +Residue selection and assignment language +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Meeko uses the **chain ID** and **residue number** to identify a residue. The arguments involving selection of residues: + +.. code-block:: bash + + -d, --delete_residues + + -f, --flexres + + -r, --reactive_flexres + +use the compact selection language that specify residues efficiently. The chain ID and the residue number(s) are separated by a colon (``:``) delimiter. Each residue number is combined with the most recent chain ID that precedes it, resulting in an expanded list of chain-residue pairs. + +For an input like ``A:5,7,BB:12C``, this selection language represents: ``residues (number) 5 and 7 in Chain A`` and ``residue (number) 12C in Chain BB``. + +The arguments involving assignment of residues to properties: + +.. code-block:: bash + + -n, --set_template