-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove general_stat cacheing #1937
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1937 +/- ##
===========================================
- Coverage 93.15% 79.57% -13.59%
===========================================
Files 27 27
Lines 25085 24959 -126
Branches 1109 1107 -2
===========================================
- Hits 23369 19862 -3507
- Misses 1682 5036 +3354
- Partials 34 61 +27
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Any idea how much this drops RAM use? For me, the case that motivated opening the issue on RAM use was computing the pairwise distance matrix. It blew up to 7+ GB for 1000s of nodes. |
We had a discussion about this earlier and @brieuclehmann is going to run it through memory-profiler. Assuming we're on the right track memory reduction wise, the plan is to add an option to general_stat to disable caching like this. |
Have used |
Is /usr/bin/time -f "%e %M" python script.py The second number output will be the peak RAM use in KB. |
Yes, it takes periodic snapshots using OS routines, so it's the "real" process footprint. |
Pasting in the profiles for ease:
|
There isn't a huge difference either way here - maybe run this on a later example? Also do just one thing on the line where we call |
I simplified the code slightly (see edits above) so that it only profiles the
Without caching:
|
Hm - notes:
|
Thanks @petrelharp ! Agreed with your point about combining For expected change in memory usage, I guess it should be a |
And here the memory-profiler results for With caching:
Without caching:
|
Ah, I see what's happening here - your tree sequence is quite short, so there's not that many nodes. Try increasing the sequence length or recombination rate so that you have at least 100K trees. Then you should see some difference. |
Ah OK, now with With caching:
Without caching:
|
Still surprisingly little... |
ping @jeromekelleher |
Ah - I don't think this is computing the right value @brieuclehmann. Tests are failing because we're getting different numbers. I'll think about how to do this again. |
Closing this for inactivity - reopen if needed! |
Nope - this is superceded by #2980. |
Here's a first pass at removing cacheing to reduce memory consumption in the general_stat framework. I've only tweaked the branch C function, and the python tests are passing.
Without caching, however,
ts.genetic_relatedness
is still slower. For the script below, with caching takes about 10s, without caching: 20s. When I increased n_samples to 1000: with caching - 58s; without caching - 110s.I may have introduced some unnecessary computation so would appreciate a sense-check!