Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare systems with multiple ligands #326

Open
cespos opened this issue Aug 5, 2024 · 9 comments
Open

Prepare systems with multiple ligands #326

cespos opened this issue Aug 5, 2024 · 9 comments

Comments

@cespos
Copy link

cespos commented Aug 5, 2024

Is your feature request related to a problem? Please describe.

I was trying to prepare a system with a protein and 2 ligands (given as SDF files).
However, both ligands were named "MOL" and both ligands have an index "1" in the output coordinate and topology files.

Describe the solution you'd like
Would it be possible to rename the residue-name of the single ligands and update the residue index?

Describe alternatives you've considered
Not sure if there are alternatives.

Additional context

This is some toy code similar to the one I used:

import BioSimSpace

lig1_coord = BioSimSpace.IO.readMolecules("lig1.sdf")[0]
lig1 = BioSimSpace.Parameters.parameterise(lig1_coord, "gaff2", work_dir=work_dir).getMolecule()

lig2_coord = BioSimSpace.IO.readMolecules("lig2.sdf")[0]
lig2 = BioSimSpace.Parameters.parameterise(lig2_coord, "gaff2", work_dir=work_dir).getMolecule()

ligands = lig1 + lig2 
BioSimSpace.IO.saveMolecules("ligands_multi", ligands, "grotop") 
BioSimSpace.IO.saveMolecules("ligands_multi", ligands, "gro87")

This is how the ligands.gro looks like:
image

@lohedges
Copy link
Contributor

lohedges commented Aug 5, 2024

Hi there, thanks for the question.

This is how the Gro87 atom records are generated:

    auto write_line = [&](int iatm)
    {
        // the atom number is iatm+1
        int atmnum = iatm + 1;

        // however, it cannot be larger than 99999, so it should be capped at this value
        if (atmnum > 99999)
            atmnum = 99999;

        int resnum = resnums.constData()[iatm];

        // similarly, the residue number cannot be greater than 99999
        if (resnum > 99999)
            resnum = 99999;

        const auto resnam = resnams.constData()[iatm];
        const auto atmnam = atmnams.constData()[iatm];
        Vector coord = 0.1 * coords.constData()[iatm]; // convert to nanometers

        if (has_velocities)
        {
            Vector vel = vels.constData()[iatm];

            lines_data[iatm] = QString("%1%2%3%4%5%6%7%8%9%10")
                                   .arg(resnum, 5)
                                   .arg(resnam.left(5), -5)
                                   .arg(atmnam.left(5), 5)
                                   .arg(atmnum, 5)
                                   .arg(coord.x(), precision + 5, 'f', precision)
                                   .arg(coord.y(), precision + 5, 'f', precision)
                                   .arg(coord.z(), precision + 5, 'f', precision)
                                   .arg(vel.x(), precision + 5, 'f', precision + 1)
                                   .arg(vel.y(), precision + 5, 'f', precision + 1)
                                   .arg(vel.z(), precision + 5, 'f', precision + 1);
        }
        else
        {
            lines_data[iatm] = QString("%1%2%3%4%5%6%7")
                                   .arg(resnum, 5)
                                   .arg(resnam.left(5), -5)
                                   .arg(atmnam.left(5), 5)
                                   .arg(atmnum, 5)
                                   .arg(coord.x(), precision + 5, 'f', precision)
                                   .arg(coord.y(), precision + 5, 'f', precision)
                                   .arg(coord.z(), precision + 5, 'f', precision);
        }
    };

I think this should really use the residue index rather than number, since number isn't guaranteed to be unique in multi-molecule systems. I'll check this.

However, it's trivial to rename and renumber the residue using Sire's editing capabilities. For example, you could do something like:

from sire.legacy import Mol as _SireMol

# Create an editable version of lig1.
edit_mol = lig1._sire_object.edit()

# Rename the residue.
edit_mol = edit_mol.residue(_SireMol.ResIdx(0)).rename(_SireMol.ResName("CAT")).molecule()

# Renumber the residue.
edit_mol = edit_mol.residue(_SireMol.ResIdx(0)).renumber(_SireMol.ResNum(42)).molecule()

# Commit the changes and update the ligand.
lig1._sire_object = edit_mol.commit()

We typically take the approach the the information in the input files is the "source of truth", so don't automatically edit it behind a users back. For example, atom and residue names can be extremely important for correct processing with downstream tools.

When only the numbering is a problem, there is also an internal convenience function that can be used to renumber all molecular constituents (chains, residues, atoms) so they are in ascending numerical order, e.g.:

from sire.legacy.IO import renumberConstituents

renumbered_ligands = renumberConstituents(ligands.toSystem()._sire_object)

On an unrelated note, is there any update on #142? I still can't debug without example input files.

Cheers.

@lohedges
Copy link
Contributor

lohedges commented Aug 5, 2024

Oh, and just to say that an easier way to edit is using the new Sire API. (My brain is still hardcoded to use the old API.)

# Create a cursor to edit the residue.
cursor = lig1._sire_object.cursor().residues()[0]

# Update the name and residue number.
cursor.name = "CAT"
cursor.number = 42

# Commit the changes back to the ligand.
lig1._sire_object = cursor.molecule().commit()

@cespos
Copy link
Author

cespos commented Aug 5, 2024

This solution looks like what I was looking for, thanks! I will give it a try and let you know if it worked. It would be great to have that in the documentation. Maybe it's already there but I was not able to find it.

@cespos
Copy link
Author

cespos commented Aug 5, 2024

Sorry for not following up on issue #142. It has now been a long time and I'll have to look at this issue again. I'll try to send you an update by tomorrow

@lohedges
Copy link
Contributor

lohedges commented Aug 5, 2024

It would be great to have that in the documentation. Maybe it's already there but I was not able to find it.

Sorry this wasn't clear. The molecular editing functionality is actually part of the Sire documentation, which is available here. In particular, there is a detailed tutorial for editing using the new API in this section of the tutorial. For BioSImSpace, this functionality is only used internally so isn't exposed to the user directly.

All of the objects in BioSImSpace._SireWrappers are just thin wrappers around their equivalent object form Sire. You can get the underlying Sire object using the ._sire_object attribute, or _getSireObject() method, then do whatever you want with it, either updating the attribute afterwards, or creating a new BioSImSpace object from it. In many cases there are equivalent functions/methods on the BioSimSpace and Sire objects, e.g. molecule.extract, etc., but we've not chosen to expose everything in order to simplify the user facing API.

I'll make a not to better cross-reference to the Sire documentation since it makes sense for developers (or more advanced users) to make use of both, and editing molecules is a big part of that.

Cheers.

@cespos
Copy link
Author

cespos commented Aug 5, 2024

Thanks, Lester, for the links to the documentation. And the solution above worked perfectly, thanks again.

I have another related question.

If I extract crystallographic water molecules in this way:

def get_water_atom_indices(system) -> List[int]:
    water_atom_indices = []
    for res1 in system.getResidues():
        if res1.name() in ["HOH", "WAT", "SOL"]:
            atoms = res1.getAtoms()
            for atom in atoms:
                water_atom_indices.append(atom.index())
    return water_atom_indices
   
input_coord = BioSimSpace.IO.readMolecules(prot_coord_file)[0]
water_atoms_indices = get_water_atom_indices(input_coord)

water_coord = input_coord.extract(water_atoms_indices)
water = BioSimSpace.Parameters.parameterise(water_coord, prot_force_field, water_model=water_model).getMolecule()

How can I loop over the water molecules to rename the residue name as "WAT"?

@lohedges
Copy link
Contributor

lohedges commented Aug 5, 2024

There are a few ways. First you could just swap the water topology to AMBER format (where WAT is the default water residue name), then use saveMolecules allong with the match_water=False kwarg, e.g.:

# Convert the waters to a default AMBER template. Here system can be a system containing any number
# of molecules.
system._set_water_topology("AMBER")

# Now save to Gro87 format. We need to set match_water=False to avoid the topology being converted to
# GROMACS format, where SOL is the default.
BSS.IO.saveMolecules("test", system, match_water=False)

Note that this first checks that the template against the first water in the system, so this wouldn't rename everything if you have a mixture of naming, which might be the issue here, i.e. the crystal waters are named differently to the others?

Alternatively, to do this in a loop, something like the following should work:

# Create a cursor to edit the molecule.
cursor = water._sire_object.cursor()

# Loop the cursor over each residue and rename.
for c in cursor.residues():
    c.name = "WAT"

# Commit the changes and update the molecule.
water._sire_object = cursor.commit()

@cespos
Copy link
Author

cespos commented Aug 5, 2024

Thanks for the very prompt replies. The first solution should work but I will try both.
I suppose different users won't have different water names in their starting files but you never know!

@lohedges
Copy link
Contributor

lohedges commented Aug 5, 2024

No problem. We can force the naming internally, but checking every water one-by-one in Python is prohibitively slow. If you just want to do it without checking the existing template, then you can directly call:

from sire.legacy.IO import setAmberWater

system._sire_object = setAmberWater(system._sire_object, "tip3p")

We need to do things like this internally prior to running simulations with certain engines, since some of the logic relies on the naming for the waters. We only do this in a copy of the system that is used to generate the input to run a simulation, so coordinates returned from the simulation will be copied back into the original topology, i.e. with whatever the original naming was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants