Improve performance of `Interchange.from_smirnoff` on polymers #1122

mattwthompson · 2024-12-09T22:59:14Z

Description

Provide a brief description of the PR's purpose here.

Checklist

Add tests
Lint
Update docstrings

codecov · 2024-12-10T14:09:08Z

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.62%. Comparing base (fcf4399) to head (5d34566).

Additional details and impacted files

mattwthompson · 2024-12-10T14:56:54Z

Ultimately I wasn't able to find a major bottleneck (i.e. something $O(n^2)$ accidentally snuck into things) with the amount of time I had to look. Still, I found a few things that were unnecessarily slow and could be fixes quickly.

I'm seeing performance gains across the board in calling ForceField.create_interchange on systems with large (> 100 heavy atoms) molecule(s).

For a polymer-ish compound of increasing length, seeing closer to single-digit differences at larger molecule sizes:

Source: https://gist.github.com/mattwthompson/f64b4ba936147492b1b43db5b28f3e55

And on a large protein (run this in Jupyter):

%%timeit
ForceField(
    "ff14sb_off_impropers_0.0.3.offxml",
).create_interchange(
    Topology.from_pdb(
        "../proteinbenchmark/proteinbenchmark/data/pdbs/hewl-1E8L-model-1.pdb",
    ),
)

# fcf439975b2a5283228b4d10c55d63c360820d90: 25.6 s ± 2.23 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 5d34566af0bdd34392915a4bb39a7f0d00cb3c4e: 22.6 s ± 1.21 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

mattwthompson · 2024-12-10T22:50:24Z

This drops the runtime of sage_ff14sb.create_interchange(top) in the toolkit showcase from ~75 to ~65 seconds

timbernat · 2024-12-13T04:06:04Z

@mattwthompson Did my own quick benchmark with a profiler to get some more details on what's causing a bottleneck, code and results linked. For reproducibility, I ran the linked script with interchange v0.4.0 and toolkit v0.16.7. I ran with 1,200 repeat units but changed the repeat unit identity from "CO" to "CCO" (i.e. to PEG, as [CO]n doesn't correspond to any real polymer to the best of my knowledge).

Moving down the call stack, looks like the highest-cost primitive functions are:

In Interchange:
- smirnoff._valence.store_potentials()
- smirnoff._create._propers()
From Python stdlib
- builtin list.index() (individually quick, but called >180k times altogether!)
- selectors.EpollSelector.select() (have literally no clue what this does/why it's called)

Make of that what you will, I haven't dug thru the source code deeply enough to know why those calls in particular dominate runtime, but I suspect this'll at least help direct effort towards the key 20% of bugs.

I also have access to ~1,400 diverse polymer chemistries which I've run thru Interchange as part of an unrelated collaboration; the individual chains are only a few hundred atoms, but are packed into 10k atom melts before porting thru Interchange. If it would be of use, I can get back with Interchange output runtimes for these chemistries sometime next week after making some tweaks to my pipeline (viz incorporating profilers into a Signac-based workflow). Hope I could be of some help!

mattwthompson · 2024-12-13T16:23:14Z

@timbernat this PR aims to streamline the bottleneck which I believe is the cause of this poor performance. Could you install against this branch (pip install git+https://github.com/openforcefield/openff-interchange.git@cache-parameter-lookups) and compare timings for a few polymers?

timbernat · 2024-12-20T00:41:41Z

@mattwthompson Apologies for the long turnaround, see here for updated profile times (same code as prior but with PR-version of Interchange). Runtimes are pretty similar, with components from _nonbonded now dominating much of the runtime.

Sorry I couldn't be of more help on this front yet, I'll be consolidating some concurrent projects following the holidays which should hopefully give me some more time on the side. Let me know if I can contribute anything else here!

mattwthompson added 2 commits December 9, 2024 16:55

PERF: Cache some parameter lookups

0899d13

FIX: Fix caching idivf

5d34566

mattwthompson marked this pull request as ready for review December 10, 2024 15:01

mattwthompson requested a review from Yoshanuikabundi December 10, 2024 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `Interchange.from_smirnoff` on polymers #1122

Improve performance of `Interchange.from_smirnoff` on polymers #1122

mattwthompson commented Dec 9, 2024

codecov bot commented Dec 10, 2024

mattwthompson commented Dec 10, 2024

mattwthompson commented Dec 10, 2024

timbernat commented Dec 13, 2024 •

edited

Loading

mattwthompson commented Dec 13, 2024

timbernat commented Dec 20, 2024

Improve performance of Interchange.from_smirnoff on polymers #1122

Are you sure you want to change the base?

Improve performance of Interchange.from_smirnoff on polymers #1122

Conversation

mattwthompson commented Dec 9, 2024

Description

Checklist

codecov bot commented Dec 10, 2024

Codecov Report

mattwthompson commented Dec 10, 2024

mattwthompson commented Dec 10, 2024

timbernat commented Dec 13, 2024 • edited Loading

mattwthompson commented Dec 13, 2024

timbernat commented Dec 20, 2024

Improve performance of `Interchange.from_smirnoff` on polymers #1122

Improve performance of `Interchange.from_smirnoff` on polymers #1122

timbernat commented Dec 13, 2024 •

edited

Loading