Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB writer does not reorder atoms as required by PDB standard #3452

Open
btodac opened this issue Nov 4, 2021 · 12 comments
Open

PDB writer does not reorder atoms as required by PDB standard #3452

btodac opened this issue Nov 4, 2021 · 12 comments

Comments

@btodac
Copy link

btodac commented Nov 4, 2021

Expected behavior

The PDB format states:

ATOM records for proteins are listed from amino to carboxyl terminus.
Nucleic acid residues are listed from the 5' to the 3' terminus.

Actual behavior

The writer just outputs in the order given in the atom group (only tested using proteins, but the writer is the same)

Code to reproduce the behavior

I imported a GROMACS trajectory, which I assume records the atoms in a different order

Current version of MDAnalysis

2.0.0
Python 3.9.7
Ubuntu

@IAlibay
Copy link
Member

IAlibay commented Nov 4, 2021

If I understand the problem properly here, as much as I like to aim for as much standard compliance as possible - I'm tempted to treat this as wontfix.

There is no way for us to know what the sequence of a protein is at write time, nor that what you have actually counts as a standard protein (e.g. cyclic peptides). In my opinion, the best thing to do here is assuming that user supplied ordering is the right ordering?

One potential thing we could do is validate for non-sequential resid? But then we'd have to capture chain changes etc... I'm not sure, is that the source of the error here @btodac ?

@orbeckst
Copy link
Member

orbeckst commented Nov 4, 2021

I agree with @IAlibay , that’s out of scope.

We should document it, though.

@btodac
Copy link
Author

btodac commented Nov 4, 2021

It is not the protein sequence that is the issue but the order of the atom records within each residue.
e.g.
ATOM 1 N SER A 2 -2.266 49.648 -27.532 1.00 39.03 N
ATOM 2 CA SER A 2 -1.744 48.330 -27.051 1.00 37.53 C
ATOM 3 C SER A 2 -0.853 47.687 -28.109 1.00 39.48 C
ATOM 4 O SER A 2 -1.251 46.718 -28.741 1.00 40.65 O
ATOM 5 CB SER A 2 -2.908 47.393 -26.724 1.00 30.13 C
ATOM 6 OG SER A 2 -3.983 48.115 -26.135 1.00 25.58 O
is correct as the carboxyl group (CB and OG) forms the last 2 entries.
However, Gromacs (I think) juggles the entries within the residue leading MDA to output

ATOM 1 N SER X 1 150.950 201.230 48.160 1.00 0.00 seg_ N
ATOM 2 CA SER X 1 151.930 200.220 48.630 1.00 0.00 seg_ C
ATOM 3 CB SER X 1 151.310 199.110 49.380 1.00 0.00 seg_ C
ATOM 4 OG SER X 1 150.500 199.630 50.350 1.00 0.00 seg_ O
ATOM 5 C SER X 1 152.760 199.660 47.420 1.00 0.00 seg_ C
ATOM 6 O SER X 1 152.320 198.810 46.670 1.00 0.00 seg_ O

I am currently making a fork and will implement a method for ensuring the correct order as the atom group should contain all the information required to produce the PDB standard format.

@orbeckst
Copy link
Member

orbeckst commented Nov 4, 2021

We're happy to have a look at proposed code.

By default we will not write out PDB files with changed atom order as this will destroy most people's workflows.

We could consider a keyword arg that reorders on writing. That's something we can discuss on a PR.

@richardjgowers
Copy link
Member

richardjgowers commented Nov 4, 2021 via email

@orbeckst
Copy link
Member

orbeckst commented Nov 4, 2021

Just to add:

However, Gromacs (I think) juggles the entries within the residue leading MDA to output

Note that the atom ordering will always be determined by GROMACS (or CHARMM/NAMD, ...) depending on the topology (force field) building blocks.

I wanted to make clear that MDAnalysis's primary purpose is to act as a layer between simulation formats and numpy arrays. It's not our job to question the ordering of atoms in the topology.

We will adhere to existing format specs as well as possible. However, "the" PDB format in particular has been so badly abused by the simulation community that we cannot walk the high ground and insist on all aspects of the format specs. Instead we have to support what the majority of users use. MDAnalysis would not be the right tool to prepare a PDB for the Protein Databank (in any case, they don't want anyone to use the PDB format anymore anyway, and rightly so). We are a bit slow on supporting the recommended new format PDBx/mmcif but see #2367 for using converters.

All of this is just to give you an idea why we are not super-enthusiastic about adding code to MDAnalysis to change the order of atoms on output.

@btodac
Copy link
Author

btodac commented Nov 4, 2021

It might be better to write a function that does the juggling and returns an AtomGroup. Then the writer can just write that juggled AG. More transparent this way.

On Thu, Nov 4, 2021 at 21:41, Oliver Beckstein @.***> wrote: We're happy to have a look at proposed code. By default we will not write out PDB files with changed atom order as this will destroy most people's workflows. We could consider a keyword arg that reorders on writing. That's something we can discuss on a PR. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3452 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB7IAKSHCCUN2HXMDXLUKL4ZXANCNFSM5HMA72RA .

This was is my intention, but it could be provided as an option to the pdb writer.

The reason this has come to my attention is that I need to process gromacs outputs into inputs for another package (HADDock). I was unaware of this atom ordering until my HADDock job failed. It is rather finicky and quite specialised but this issue may occur elsewhere too. I can't imagine implementing the correct order would create problems with other software as either they don't care about the ordering or they expect the PDB standard.

@orbeckst orbeckst changed the title PDB writer does not follw standard PDB writer does not reorder atoms as required by PDB standard Jan 5, 2022
@orbeckst
Copy link
Member

orbeckst commented Jan 5, 2022

To get this issue to an actionable conclusion:

I can't imagine implementing the correct order would create problems with other software as either they don't care about the ordering or they expect the PDB standard.

Atom order is really important when the files are used as topologies for trajectories. Changing the order will break so many things. Therefore, we will not by default change the output order for PDB files.

I labeled the issue as "documentation" so we will add a note to the PDB writer docs.

If anyone is interested in providing code to robustly reorder atoms then we can consider it as a utility function and perhaps as an option for the PDBWriter. However, I'd consider this issue closed once the behavior is documented.

@hector-s-m
Copy link

@orbeckst Hi! I am having the same issue here. Was this ever solved? Is there a way to fix it using a Python script?

I found an indirect way of reordering atoms in PDB files to the canonical non-GROMACS order of N, CA, C, O..., which is to load the file in Pymol, and then selecting "Export Molecule"

This brings atoms back from N, CA, ..., C, O but it is rather tedious.

@RMeli
Copy link
Member

RMeli commented Jul 11, 2023

Hi @hector-s-m. Given the discussion above and that there is no open PR against this issue, I'd assume it has not been "fixed" (see the discussion above for why I used quotes).

However, since you mentioned that PyMol does the re-ordering you want, does PyMol's Python API work too? It will still be a bit annoying to load/save the file with PyMol, but having it within a script might make it a bit less tedious (it it works, I haven't tried...).

Something like

import MDAnalysis as mda
from pymol import cmd

cmd.load("unordered.pdb")
cmd.save("ordered.pdb") # Hopefully ordered???

u = mda.Universe("ordered.pdb")

@hector-s-m
Copy link

Hi @hector-s-m. Given the discussion above and that there is no open PR against this issue, I'd assume it has not been "fixed" (see the discussion above for why I used quotes).

However, since you mentioned that PyMol does the re-ordering you want, does PyMol's Python API work too? It will still be a bit annoying to load/save the file with PyMol, but having it within a script might make it a bit less tedious (it it works, I haven't tried...).

Something like

import MDAnalysis as mda
from pymol import cmd

cmd.load("unordered.pdb")
cmd.save("ordered.pdb") # Hopefully ordered???

u = mda.Universe("ordered.pdb")

Yeah... Thanks for looking into this. I had this idea earlier but for some reason, despite installing Pymol in my conda environment, I can't get it to work. :(

@orbeckst
Copy link
Member

@hector-s-m no immediate code changes are planned to address this issue and as you see from the discussion, we probably won't implement anything directly in the PDBWriter. We will document the behavior.

We will consider a utility function for reordering atoms in a Universe. That could be generally useful and can then be used with other formats, too. Something like

u = mda.Universe("gromacs.pdb")
p = u.select_atoms("protein")
p_reordered = reorder(p)   # reorder() needs to be written by someone!
not_protein = u.atoms - p
u_reordered = mda.Merge(p_reordered, not_protein)
u_reordered.atoms.write("reordered.gro")   # can write as a GRO or PDB or ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants