Feature/dssp #4304

marinegor · 2023-09-29T14:07:08Z

Fixes #1612

Changes made in this Pull Request:

introduces MDAnalysis.analysis.dssp.DSSP class for secondary structure analysis, using code implemented in pydssp package available for secondary structure annotation
adds tests from pydssp package

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4304.org.readthedocs.build/en/4304/

pep8speaks · 2023-09-29T14:07:18Z

Hello @marinegor! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/analysis/dssp/dssp.py:

Line 2:80: E501 line too long (90 > 79 characters)
Line 14:80: E501 line too long (87 > 79 characters)
Line 16:80: E501 line too long (115 > 79 characters)
Line 27:80: E501 line too long (83 > 79 characters)
Line 46:80: E501 line too long (80 > 79 characters)
Line 62:80: E501 line too long (80 > 79 characters)
Line 71:80: E501 line too long (84 > 79 characters)
Line 78:80: E501 line too long (83 > 79 characters)
Line 80:80: E501 line too long (84 > 79 characters)
Line 82:80: E501 line too long (80 > 79 characters)
Line 115:1: W293 blank line contains whitespace
Line 116:59: W291 trailing whitespace
Line 117:80: E501 line too long (85 > 79 characters)
Line 118:80: E501 line too long (81 > 79 characters)
Line 118:82: W291 trailing whitespace
Line 119:80: E501 line too long (84 > 79 characters)
Line 120:80: E501 line too long (83 > 79 characters)
Line 123:1: W293 blank line contains whitespace
Line 125:80: E501 line too long (86 > 79 characters)
Line 125:87: W291 trailing whitespace
Line 127:80: E501 line too long (84 > 79 characters)
Line 127:85: W291 trailing whitespace
Line 128:80: E501 line too long (80 > 79 characters)
Line 128:81: W291 trailing whitespace
Line 132:1: W293 blank line contains whitespace
Line 135:1: W293 blank line contains whitespace
Line 136:80: E501 line too long (85 > 79 characters)
Line 137:80: E501 line too long (81 > 79 characters)
Line 199:75: W291 trailing whitespace
Line 228:80: E501 line too long (80 > 79 characters)
Line 232:78: W291 trailing whitespace
Line 233:80: W291 trailing whitespace
Line 234:80: E501 line too long (83 > 79 characters)
Line 266:72: W291 trailing whitespace
Line 273:80: E501 line too long (83 > 79 characters)
Line 299:80: E501 line too long (86 > 79 characters)
Line 305:80: E501 line too long (82 > 79 characters)
Line 312:80: E501 line too long (81 > 79 characters)
Line 312:82: W291 trailing whitespace
Line 313:80: E501 line too long (81 > 79 characters)
Line 322:80: E501 line too long (91 > 79 characters)
Line 323:80: E501 line too long (87 > 79 characters)
Line 332:80: E501 line too long (91 > 79 characters)
Line 361:80: E501 line too long (81 > 79 characters)
Line 392:80: E501 line too long (87 > 79 characters)

In the file package/MDAnalysis/analysis/dssp/pydssp_numpy.py:

Line 68:80: E501 line too long (85 > 79 characters)
Line 157:80: E501 line too long (84 > 79 characters)
Line 179:80: E501 line too long (83 > 79 characters)
Line 206:80: E501 line too long (80 > 79 characters)
Line 207:80: E501 line too long (81 > 79 characters)

In the file testsuite/MDAnalysisTests/analysis/test_dssp.py:

Line 10:80: E501 line too long (81 > 79 characters)
Line 11:80: E501 line too long (85 > 79 characters)
Line 12:80: E501 line too long (82 > 79 characters)
Line 30:80: E501 line too long (80 > 79 characters)
Line 42:80: E501 line too long (80 > 79 characters)
Line 52:80: E501 line too long (80 > 79 characters)
Line 55:80: E501 line too long (82 > 79 characters)

Comment last updated at 2024-06-13 05:15:32 UTC

github-actions · 2023-09-29T14:09:33Z

Linter Bot Results:

Hi @marinegor! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location	Outcome
main package	⚠️ Possible failure
testsuite	⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/9494112403/job/26163914658

Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

marinegor · 2023-09-29T23:55:10Z

There's a problem I don't know the workaround for yet: prolines.

In the original pydssp following happens:

atoms get parsed from pdb regardless of residue type
hydrogens get guilt automatically
based on that, secondary structure is assigned

However, in my implementation all prolines (and N-terminal residue) are assigned '-' (=unstructured loop).

This is likely less correct than original implementation, but I'm not sure how to fix that.
My suggestion would be to:

if guess_hydrogens=False, use positions of existing hydrogens and fill in fake proline hydrogens with pydssp
if guess_hydrogens=True, default to pydssp behaviour with fully automatic hydrogens

…cture in trajectory

marinegor · 2023-10-02T16:19:50Z

The prolines problem is now dealt with, exactly as I described above -- if guess_hydrogens=False, it would take hydrogens from atoms that have them, and otherwise (=prolines & N-term) guess their positions. Otherwise, would guess all the positions.

package/MDAnalysis/analysis/dssp.py

orbeckst

This looks already very promising.

For an initial quick review please see inline comments.

Additional requests

Fix the tests: AttributeError: module 'MDAnalysisTests.datafiles' has no attribute 'DSSP'
We also need a test with a trajectory.
Compress the PDF files with bz2 or gz.
Update CHANGELOG.
Add a .. versionadded:: 2.7.0 to the docs.
Fix PEP8 complaints.

Initially distinguishing H,E,- is good but eventually it would be good to distinguish other secondary structures, too, in particular 3₁₀ and π helices.

package/MDAnalysis/analysis/dssp.py

orbeckst · 2023-10-10T02:24:18Z

package/MDAnalysis/analysis/dssp.py

+    h3 = h3 * ~np.roll(helix4, -1, 1) * ~helix4  # helix4 is higher prioritized
+    h5 = h5 * ~np.roll(helix4, -1, 1) * ~helix4  # helix4 is higher prioritized


Can we get 3_10 and π helix, too?

I'm not certain, but seems like yes (used this paper's Fig 1):

from MDAnalysis.analysis.dssp import DSSP import MDAnalysis as mda u = mda.Universe('./1FUR.pdb') r = DSSP(u).run() r.results.resids[151:158] # array([155, 156, 157, 158, 159, 160, 161]) ''.join(r.results.dssp[0])[151:158] # 'HHHH-HH'

Yes, it can detect 3-10 and pi helixes indeed -- I generated idealized structures with mmtbx from here and ran DSSP on them:

dssp/o_beta_seq.pdb -------------------- dssp/o_helix310_seq.pdb -HHHHHHHHHHHHHHHHHH- dssp/o_helix_seq.pdb -HHHHHHHHHHHHHHHHHH- o_helix_pi_seq.pdb -HHHHHHHHHHHHHHHHHH-

package/MDAnalysis/analysis/dssp.py

testsuite/MDAnalysisTests/analysis/test_dssp.py

test_dssp.py

manual test a top level, should not be in source

testsuite/MDAnalysisTests/analysis/test_dssp.py

orbeckst · 2024-05-03T22:19:07Z

@IAlibay would you mind taking a quick final glance at the DSSP PR?

I think your comments were addressed but would prefer you confirm instead of me dismissing a totally valid review.

Would be great if we could finally get that one in.

marinegor · 2024-05-20T13:29:27Z

@IAlibay any comments on this one?

IAlibay

Apologies for the long review delay.

I just have the one docstring confusion, once addressed then it's good to go.

Tagging @orbeckst here - please do dismiss my review if this gets addressed and I don't get a chance to re-review.

package/MDAnalysis/analysis/dssp/dssp.py

marinegor · 2024-05-29T07:19:37Z

Hi @IAlibay -- couldn't follow your link, maybe you explain in more detail? I updated class's documentation slightly to match the description of the run method, but not sure if it solves your confusion

orbeckst

@marinegor and @IAlibay I think I understand the point of your discussion: As far as I can tell, I agree with @IAlibay that the current ValueError check does not exactly check that there's exactly one HN. If we could change the code then that would be safer, please see comments.

I think I also found a potential issue with how _heavy_atoms and _hydrogens are constructed. Please check.

package/MDAnalysis/analysis/dssp/dssp.py

orbeckst · 2024-05-31T01:42:21Z

package/MDAnalysis/analysis/dssp/dssp.py

+        self._heavy_atoms: dict[str, "AtomGroup"] = {
+            t: ag.atoms[
+                np.isin(
+                    ag.names, t.split()
+                )  # need split() since `np.isin` takes an iterable as second argument
+                # and "N".split() -> ["N"]
+            ]
+            for t in heavyatom_names
+        }
+        self._hydrogens: list["AtomGroup"] = [
+            res.atoms.select_atoms(f"name {hydrogen_name}") for res in ag.residues
+        ]


From the following code I see that there's the assumption that the order of atoms in _heavy_atoms["CA"] is the same as in _hydrogens.

However, that may not be true: _hydrogens is created from a select_atoms() which always sorts the atoms by index in increasing order whereas ag's in _heavy_atoms are created by look up and slicing. If the original input ag is NOT ORDERED then _hydrogens will be ORDERED whereas _heavy_atoms remain in the CUSTOM ORDER of the original ag. At least that's what it looks like to me.

It may be safer to just sort the original ag first.

It may be safer to just sort the original ag first.

but there's line 274 that says:

ag: AtomGroup = atoms.select_atoms("protein")

and then all operations are on this ag. I even think that you suggested this, but I'm not entirely sure :)

You're right: ag is guaranteed to be ordered.

Maybe add a comment somewhere that the code is guaranteed to work because ag is ordered. Just in case someone later wants to make it more general (eg so that one doesn't have to rely on MDAnalysis's definition of "protein").

See my comment on Dr. Richard Hipp's obsession with code comments https://discord.com/channels/807348386012987462/807712498249236520/1237819983833337917

orbeckst · 2024-06-11T04:36:22Z

@marinegor do you have any question regarding my last review? Please ping me if you need feedback or want to discuss anything.

Co-authored-by: Oliver Beckstein <[email protected]>

marinegor · 2024-06-12T19:54:06Z

@orbeckst sorry for some silence -- I don't have any questions, and also have added few lines that improve the guess-hydrogens check, following your suggestions.

orbeckst

I am happy! Great work, @marinegor !

orbeckst · 2024-06-13T02:55:51Z

package/MDAnalysis/analysis/dssp/dssp.py

+        self._heavy_atoms: dict[str, "AtomGroup"] = {
+            t: ag.atoms[
+                np.isin(
+                    ag.names, t.split()
+                )  # need split() since `np.isin` takes an iterable as second argument
+                # and "N".split() -> ["N"]
+            ]
+            for t in heavyatom_names
+        }
+        self._hydrogens: list["AtomGroup"] = [
+            res.atoms.select_atoms(f"name {hydrogen_name}") for res in ag.residues
+        ]


You're right: ag is guaranteed to be ordered.

Maybe add a comment somewhere that the code is guaranteed to work because ag is ordered. Just in case someone later wants to make it more general (eg so that one doesn't have to rely on MDAnalysis's definition of "protein").

orbeckst · 2024-06-13T02:57:20Z

package/MDAnalysis/analysis/dssp/dssp.py

+        self._heavy_atoms: dict[str, "AtomGroup"] = {
+            t: ag.atoms[
+                np.isin(
+                    ag.names, t.split()
+                )  # need split() since `np.isin` takes an iterable as second argument
+                # and "N".split() -> ["N"]
+            ]
+            for t in heavyatom_names
+        }
+        self._hydrogens: list["AtomGroup"] = [
+            res.atoms.select_atoms(f"name {hydrogen_name}") for res in ag.residues
+        ]


See my comment on Dr. Richard Hipp's obsession with code comments https://discord.com/channels/807348386012987462/807712498249236520/1237819983833337917

orbeckst

(just adding doc fixes) still ✅

package/MDAnalysis/analysis/dssp/dssp.py

orbeckst · 2024-06-13T03:28:57Z

@IAlibay I resolved most of the comments that you had because @marinegor had addressed them. There are only three or so for you to quickly look at. Would be great to get it shipped once you are satisfied. Cheers!

- corrected docs for DSSP.results.dssp_ndarray and results.dssp - added docs for DSSP.results.resids - more reST fixes and updates

orbeckst · 2024-06-18T08:04:26Z

Congratulations @marinegor , a big effort merged and one of the oldest outstanding issues closed! Well done. Thank you for all your work!

marinegor · 2024-06-18T13:38:03Z

Thanks for your support and guidance, that was fun!

marinegor added 5 commits September 29, 2023 13:14

Add DSSP test files

5065e2d

Add DSSP test files

1dfdb29

First implementation of DSSP class

f73fcda

Remove pydssp dependency

fe66c30

Add documentation

13bc86a

github-actions bot added the Component-Analysis label Sep 29, 2023

marinegor added 2 commits September 29, 2023 17:53

Fix hydrogen selection issue

cf79a89

Remove unnecessary print

7230845

Fix problem with prolines and add example with average secondary stru…

929d011

…cture in trajectory

marinegor added 4 commits October 2, 2023 18:24

Fix pep8 formatting issues

7c966d5

Fix pep8 formatting issues

a5a4891

Replace ndarray in results with list, and remove unnecessary code

c2a333c

Add documentation for translate

bd5ba75

orbeckst reviewed Oct 10, 2023

View reviewed changes

package/MDAnalysis/analysis/dssp.py Outdated Show resolved Hide resolved

orbeckst requested changes Oct 10, 2023

View reviewed changes

orbeckst self-assigned this Oct 10, 2023

orbeckst linked an issue Oct 10, 2023 that may be closed by this pull request

Support for secondary structure identification #1612

Closed

orbeckst added the hackathon part of a MDAnalysis coding event label Oct 10, 2023

Added a reference to Kabsch 983 DSSP paper

0d6f734

IAlibay added the enhancement label Nov 5, 2023

marinegor added 6 commits November 24, 2023 12:41

Update documentation and formatting

8840d9b

Make numpy docstrings

c6fca72

Compress dssp pdb files

20e672f

Update documentation and add trajectory tests

47c4123

Fix pep8 in docstrings and add pydssp license info

c4cf7c9

Add versionadded

de2133d

marinegor and others added 3 commits April 19, 2024 16:09

Update hydrogen_name description with a Note

99b2164

Add print line in dssp documentation

58c24c1

Merge branch 'develop' into feature/dssp

610b39f

orbeckst reviewed May 3, 2024

View reviewed changes

test_dssp.py Outdated Show resolved Hide resolved

Delete test_dssp.py

f50f5c3

manual test a top level, should not be in source

orbeckst reviewed May 3, 2024

View reviewed changes

testsuite/MDAnalysisTests/analysis/test_dssp.py Outdated Show resolved Hide resolved

Update testsuite/MDAnalysisTests/analysis/test_dssp.py

848060e

IAlibay requested changes May 28, 2024

View reviewed changes

package/MDAnalysis/analysis/dssp/dssp.py Outdated Show resolved Hide resolved

Update class documentation with "hydrogen_atom" argument explanation

84dd012

orbeckst requested changes May 31, 2024

View reviewed changes

Egor Marin and others added 2 commits June 12, 2024 21:46

Add check for exactly 1 hydrogen on the atom

ebcab2f

Apply suggestions from code review

f4daa24

Co-authored-by: Oliver Beckstein <[email protected]>

Egor Marin added 2 commits June 12, 2024 22:26

Generalize hydrogen checks

2fef313

Merge remote-tracking branch 'origin/feature/dssp' into feature/dssp

daf6529

orbeckst approved these changes Jun 13, 2024

View reviewed changes

package/MDAnalysis/analysis/dssp/dssp.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/dssp/dssp.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/dssp/dssp.py Outdated Show resolved Hide resolved

minor dssp reST fixes & updates

884630d

orbeckst added 3 commits June 12, 2024 23:34

fix dssp doc

8d503a2

fixed docs for DSSP.results

5438877

- corrected docs for DSSP.results.dssp_ndarray and results.dssp - added docs for DSSP.results.resids - more reST fixes and updates

Update dssp.py

d3d2cf2

IAlibay approved these changes Jun 17, 2024

View reviewed changes

orbeckst merged commit d2d9d27 into MDAnalysis:develop Jun 18, 2024
22 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/dssp #4304

Feature/dssp #4304

marinegor commented Sep 29, 2023 •

edited by orbeckst

Loading

pep8speaks commented Sep 29, 2023 •

edited

Loading

github-actions bot commented Sep 29, 2023 •

edited

Loading

marinegor commented Sep 29, 2023 •

edited

Loading

marinegor commented Oct 2, 2023

orbeckst left a comment •

edited

Loading

orbeckst Oct 10, 2023

marinegor Oct 12, 2023

marinegor Jan 31, 2024

orbeckst commented May 3, 2024

marinegor commented May 20, 2024

IAlibay left a comment

marinegor commented May 29, 2024

orbeckst left a comment

orbeckst May 31, 2024 •

edited

Loading

marinegor Jun 12, 2024

orbeckst Jun 13, 2024

orbeckst Jun 13, 2024

orbeckst commented Jun 11, 2024

marinegor commented Jun 12, 2024

orbeckst left a comment

orbeckst Jun 13, 2024

orbeckst Jun 13, 2024

orbeckst left a comment

orbeckst commented Jun 13, 2024

orbeckst commented Jun 18, 2024

marinegor commented Jun 18, 2024 via email

		h3 = h3 * ~np.roll(helix4, -1, 1) * ~helix4 # helix4 is higher prioritized
		h5 = h5 * ~np.roll(helix4, -1, 1) * ~helix4 # helix4 is higher prioritized

Feature/dssp #4304

Feature/dssp #4304

Conversation

marinegor commented Sep 29, 2023 • edited by orbeckst Loading

PR Checklist

Developers certificate of origin

pep8speaks commented Sep 29, 2023 • edited Loading

Comment last updated at 2024-06-13 05:15:32 UTC

github-actions bot commented Sep 29, 2023 • edited Loading

Linter Bot Results:

marinegor commented Sep 29, 2023 • edited Loading

marinegor commented Oct 2, 2023

orbeckst left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented May 3, 2024

marinegor commented May 20, 2024

IAlibay left a comment

Choose a reason for hiding this comment

marinegor commented May 29, 2024

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Jun 11, 2024

marinegor commented Jun 12, 2024

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Jun 13, 2024

orbeckst commented Jun 18, 2024

marinegor commented Jun 18, 2024 via email

marinegor commented Sep 29, 2023 •

edited by orbeckst

Loading

pep8speaks commented Sep 29, 2023 •

edited

Loading

github-actions bot commented Sep 29, 2023 •

edited

Loading

marinegor commented Sep 29, 2023 •

edited

Loading

orbeckst left a comment •

edited

Loading

orbeckst May 31, 2024 •

edited

Loading