-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory consumption of tree sequence statistics #647
Comments
Well, we could add an option to either store the intermediate results or recompute them. I don't think this would add too much complexity. That is, assuming there is a significant difference in performance. If not, then we should get rid of the stored results. Should be easy enough to do a quick test? |
Is this still an open issue? I think we should probably close it unless someone is intending to follow it up. |
It does need addressing by someone who knows the code. There are certain
situations where one ends up with unnecessary runtime crashes.
Unfortunately, I have a hard time with the stats code.
…On Thu, Aug 27, 2020, 6:06 AM Jerome Kelleher ***@***.***> wrote:
Is this still an open issue? I think we should probably close it unless
someone is intending to follow it up.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#647 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQ6OHZYVCXU3TQRH3NKR2DSCZKK5ANCNFSM4NJ3KGJQ>
.
|
OK, let's keep it open. |
I'm going to close this because we're addressing the general problem with pairwise statistics using a different framework now (starting from the divergence matrix in #2736) The short version I think is that the stats API assumes that we have a relatively small number of statistics, and if we have a large number of related statistics to compute then other approaches should be used. |
I actually think it's still worth running those tests - what you say is true for pairwise stats, but there's also clasess of stats with output equal to the number of samples that would be nice to do this way; e.g. "relatedness matrix times a vector". |
Closed in #2980. |
When the output dimension of a statistic is large, so is the memory consumption.
The following example calculates the pairwise distance matrix for all samples from a single tree and requires a bit over 7GB of RAM for a small number of samples (1000).
The versions are:
0.7.4
0.2.3
From talking to @petrelharp about this, it appears that some/most of the RAM use may be attributable to some memoization during the calculation that (he feels) may not be necessary?
The text was updated successfully, but these errors were encountered: