PDB writer does not reorder atoms as required by PDB standard #3452

btodac · 2021-11-04T18:01:17Z

Expected behavior

The PDB format states:

ATOM records for proteins are listed from amino to carboxyl terminus.
Nucleic acid residues are listed from the 5' to the 3' terminus.

Actual behavior

The writer just outputs in the order given in the atom group (only tested using proteins, but the writer is the same)

Code to reproduce the behavior

I imported a GROMACS trajectory, which I assume records the atoms in a different order

Current version of MDAnalysis

2.0.0
Python 3.9.7
Ubuntu

IAlibay · 2021-11-04T20:03:47Z

If I understand the problem properly here, as much as I like to aim for as much standard compliance as possible - I'm tempted to treat this as wontfix.

There is no way for us to know what the sequence of a protein is at write time, nor that what you have actually counts as a standard protein (e.g. cyclic peptides). In my opinion, the best thing to do here is assuming that user supplied ordering is the right ordering?

One potential thing we could do is validate for non-sequential resid? But then we'd have to capture chain changes etc... I'm not sure, is that the source of the error here @btodac ?

orbeckst · 2021-11-04T20:17:19Z

I agree with @IAlibay , that’s out of scope.

We should document it, though.

btodac · 2021-11-04T21:23:44Z

It is not the protein sequence that is the issue but the order of the atom records within each residue.
e.g.
ATOM 1 N SER A 2 -2.266 49.648 -27.532 1.00 39.03 N
ATOM 2 CA SER A 2 -1.744 48.330 -27.051 1.00 37.53 C
ATOM 3 C SER A 2 -0.853 47.687 -28.109 1.00 39.48 C
ATOM 4 O SER A 2 -1.251 46.718 -28.741 1.00 40.65 O
ATOM 5 CB SER A 2 -2.908 47.393 -26.724 1.00 30.13 C
ATOM 6 OG SER A 2 -3.983 48.115 -26.135 1.00 25.58 O
is correct as the carboxyl group (CB and OG) forms the last 2 entries.
However, Gromacs (I think) juggles the entries within the residue leading MDA to output

ATOM 1 N SER X 1 150.950 201.230 48.160 1.00 0.00 seg_ N
ATOM 2 CA SER X 1 151.930 200.220 48.630 1.00 0.00 seg_ C
ATOM 3 CB SER X 1 151.310 199.110 49.380 1.00 0.00 seg_ C
ATOM 4 OG SER X 1 150.500 199.630 50.350 1.00 0.00 seg_ O
ATOM 5 C SER X 1 152.760 199.660 47.420 1.00 0.00 seg_ C
ATOM 6 O SER X 1 152.320 198.810 46.670 1.00 0.00 seg_ O

I am currently making a fork and will implement a method for ensuring the correct order as the atom group should contain all the information required to produce the PDB standard format.

orbeckst · 2021-11-04T21:41:35Z

We're happy to have a look at proposed code.

By default we will not write out PDB files with changed atom order as this will destroy most people's workflows.

We could consider a keyword arg that reorders on writing. That's something we can discuss on a PR.

richardjgowers · 2021-11-04T21:50:49Z

It might be better to write a function that does the juggling and returns an AtomGroup. Then the writer can just write that juggled AG. More transparent this way.

…

On Thu, Nov 4, 2021 at 21:41, Oliver Beckstein ***@***.***> wrote: We're happy to have a look at proposed code. By default we will not write out PDB files with changed atom order as this will destroy most people's workflows. We could consider a keyword arg that reorders on writing. That's something we can discuss on a PR. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3452 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGSGB7IAKSHCCUN2HXMDXLUKL4ZXANCNFSM5HMA72RA> .

orbeckst · 2021-11-04T21:55:17Z

Just to add:

However, Gromacs (I think) juggles the entries within the residue leading MDA to output

Note that the atom ordering will always be determined by GROMACS (or CHARMM/NAMD, ...) depending on the topology (force field) building blocks.

I wanted to make clear that MDAnalysis's primary purpose is to act as a layer between simulation formats and numpy arrays. It's not our job to question the ordering of atoms in the topology.

We will adhere to existing format specs as well as possible. However, "the" PDB format in particular has been so badly abused by the simulation community that we cannot walk the high ground and insist on all aspects of the format specs. Instead we have to support what the majority of users use. MDAnalysis would not be the right tool to prepare a PDB for the Protein Databank (in any case, they don't want anyone to use the PDB format anymore anyway, and rightly so). We are a bit slow on supporting the recommended new format PDBx/mmcif but see #2367 for using converters.

All of this is just to give you an idea why we are not super-enthusiastic about adding code to MDAnalysis to change the order of atoms on output.

btodac · 2021-11-04T22:13:00Z

It might be better to write a function that does the juggling and returns an AtomGroup. Then the writer can just write that juggled AG. More transparent this way.
…
On Thu, Nov 4, 2021 at 21:41, Oliver Beckstein @.***> wrote: We're happy to have a look at proposed code. By default we will not write out PDB files with changed atom order as this will destroy most people's workflows. We could consider a keyword arg that reorders on writing. That's something we can discuss on a PR. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3452 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB7IAKSHCCUN2HXMDXLUKL4ZXANCNFSM5HMA72RA .

This was is my intention, but it could be provided as an option to the pdb writer.

The reason this has come to my attention is that I need to process gromacs outputs into inputs for another package (HADDock). I was unaware of this atom ordering until my HADDock job failed. It is rather finicky and quite specialised but this issue may occur elsewhere too. I can't imagine implementing the correct order would create problems with other software as either they don't care about the ordering or they expect the PDB standard.

orbeckst · 2022-01-05T00:35:02Z

To get this issue to an actionable conclusion:

I can't imagine implementing the correct order would create problems with other software as either they don't care about the ordering or they expect the PDB standard.

Atom order is really important when the files are used as topologies for trajectories. Changing the order will break so many things. Therefore, we will not by default change the output order for PDB files.

I labeled the issue as "documentation" so we will add a note to the PDB writer docs.

If anyone is interested in providing code to robustly reorder atoms then we can consider it as a utility function and perhaps as an option for the PDBWriter. However, I'd consider this issue closed once the behavior is documented.

hector-s-m · 2023-07-11T20:56:12Z

@orbeckst Hi! I am having the same issue here. Was this ever solved? Is there a way to fix it using a Python script?

I found an indirect way of reordering atoms in PDB files to the canonical non-GROMACS order of N, CA, C, O..., which is to load the file in Pymol, and then selecting "Export Molecule"

This brings atoms back from N, CA, ..., C, O but it is rather tedious.

RMeli · 2023-07-11T21:14:34Z

Hi @hector-s-m. Given the discussion above and that there is no open PR against this issue, I'd assume it has not been "fixed" (see the discussion above for why I used quotes).

However, since you mentioned that PyMol does the re-ordering you want, does PyMol's Python API work too? It will still be a bit annoying to load/save the file with PyMol, but having it within a script might make it a bit less tedious (it it works, I haven't tried...).

Something like

import MDAnalysis as mda
from pymol import cmd

cmd.load("unordered.pdb")
cmd.save("ordered.pdb") # Hopefully ordered???

u = mda.Universe("ordered.pdb")

hector-s-m · 2023-07-11T21:17:48Z

Hi @hector-s-m. Given the discussion above and that there is no open PR against this issue, I'd assume it has not been "fixed" (see the discussion above for why I used quotes).

However, since you mentioned that PyMol does the re-ordering you want, does PyMol's Python API work too? It will still be a bit annoying to load/save the file with PyMol, but having it within a script might make it a bit less tedious (it it works, I haven't tried...).

Something like
import MDAnalysis as mda
from pymol import cmd

cmd.load("unordered.pdb")
cmd.save("ordered.pdb") # Hopefully ordered???

u = mda.Universe("ordered.pdb")

Yeah... Thanks for looking into this. I had this idea earlier but for some reason, despite installing Pymol in my conda environment, I can't get it to work. :(

orbeckst · 2023-07-11T21:24:32Z

@hector-s-m no immediate code changes are planned to address this issue and as you see from the discussion, we probably won't implement anything directly in the PDBWriter. We will document the behavior.

We will consider a utility function for reordering atoms in a Universe. That could be generally useful and can then be used with other formats, too. Something like

u = mda.Universe("gromacs.pdb")
p = u.select_atoms("protein")
p_reordered = reorder(p)   # reorder() needs to be written by someone!
not_protein = u.atoms - p
u_reordered = mda.Merge(p_reordered, not_protein)
u_reordered.atoms.write("reordered.gro")   # can write as a GRO or PDB or ...

orbeckst added the Format-PDB label Nov 4, 2021

orbeckst added the Component-Docs label Jan 5, 2022

orbeckst changed the title ~~PDB writer does not follw standard~~ PDB writer does not reorder atoms as required by PDB standard Jan 5, 2022

orbeckst added the wontfix label Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDB writer does not reorder atoms as required by PDB standard #3452

PDB writer does not reorder atoms as required by PDB standard #3452

btodac commented Nov 4, 2021

IAlibay commented Nov 4, 2021

orbeckst commented Nov 4, 2021

btodac commented Nov 4, 2021

orbeckst commented Nov 4, 2021

richardjgowers commented Nov 4, 2021 via email

orbeckst commented Nov 4, 2021

btodac commented Nov 4, 2021

orbeckst commented Jan 5, 2022 •

edited

Loading

hector-s-m commented Jul 11, 2023

RMeli commented Jul 11, 2023

hector-s-m commented Jul 11, 2023

orbeckst commented Jul 11, 2023

PDB writer does not reorder atoms as required by PDB standard #3452

PDB writer does not reorder atoms as required by PDB standard #3452

Comments

btodac commented Nov 4, 2021

Expected behavior

Actual behavior

Code to reproduce the behavior

Current version of MDAnalysis

IAlibay commented Nov 4, 2021

orbeckst commented Nov 4, 2021

btodac commented Nov 4, 2021

orbeckst commented Nov 4, 2021

richardjgowers commented Nov 4, 2021 via email

orbeckst commented Nov 4, 2021

btodac commented Nov 4, 2021

orbeckst commented Jan 5, 2022 • edited Loading

hector-s-m commented Jul 11, 2023

RMeli commented Jul 11, 2023

hector-s-m commented Jul 11, 2023

orbeckst commented Jul 11, 2023

orbeckst commented Jan 5, 2022 •

edited

Loading