-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of Interchange.from_smirnoff
on polymers
#1122
base: main
Are you sure you want to change the base?
Conversation
Ultimately I wasn't able to find a major bottleneck (i.e. something I'm seeing performance gains across the board in calling For a polymer-ish compound of increasing length, seeing closer to single-digit differences at larger molecule sizes: Source: https://gist.github.com/mattwthompson/f64b4ba936147492b1b43db5b28f3e55 And on a large protein (run this in Jupyter): %%timeit
ForceField(
"ff14sb_off_impropers_0.0.3.offxml",
).create_interchange(
Topology.from_pdb(
"../proteinbenchmark/proteinbenchmark/data/pdbs/hewl-1E8L-model-1.pdb",
),
)
# fcf439975b2a5283228b4d10c55d63c360820d90: 25.6 s ± 2.23 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 5d34566af0bdd34392915a4bb39a7f0d00cb3c4e: 22.6 s ± 1.21 s per loop (mean ± std. dev. of 7 runs, 1 loop each) |
This drops the runtime of |
@mattwthompson Did my own quick benchmark with a profiler to get some more details on what's causing a bottleneck, code and results linked. For reproducibility, I ran the linked script with interchange v0.4.0 and toolkit v0.16.7. I ran with 1,200 repeat units but changed the repeat unit identity from "CO" to "CCO" (i.e. to PEG, as [CO]n doesn't correspond to any real polymer to the best of my knowledge). Moving down the call stack, looks like the highest-cost primitive functions are:
Make of that what you will, I haven't dug thru the source code deeply enough to know why those calls in particular dominate runtime, but I suspect this'll at least help direct effort towards the key 20% of bugs. I also have access to ~1,400 diverse polymer chemistries which I've run thru Interchange as part of an unrelated collaboration; the individual chains are only a few hundred atoms, but are packed into 10k atom melts before porting thru Interchange. If it would be of use, I can get back with Interchange output runtimes for these chemistries sometime next week after making some tweaks to my pipeline (viz incorporating profilers into a Signac-based workflow). Hope I could be of some help! |
@timbernat this PR aims to streamline the bottleneck which I believe is the cause of this poor performance. Could you install against this branch ( |
@mattwthompson Apologies for the long turnaround, see here for updated profile times (same code as prior but with PR-version of Interchange). Runtimes are pretty similar, with components from _nonbonded now dominating much of the runtime. Sorry I couldn't be of more help on this front yet, I'll be consolidating some concurrent projects following the holidays which should hopefully give me some more time on the side. Let me know if I can contribute anything else here! |
Description
Provide a brief description of the PR's purpose here.
Checklist