-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bayes #23
Bayes #23
Conversation
…me now passed as an argument
PR approved?
PR approved?
all errors should be fixed now.
…tive paths, created modified_io, added 1e-10 to return value of likelihood function
… on _tree_utils_new.py
…ackend_geno, created _treemhn.py file which contains the TreeMHNLoglikelihood class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Laurenz, great work! I'll take a closer look tomorrow or on Saturday – just one question: did you try running with e.g., only 2 samples or just the values in the Snakemake file? (Which are reasonable in terms of the collected sample size, but perhaps too computationally intense...)
tests/trees/test_likelihood.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for removing this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contained the tests for the oldest version of the likelihood implementation. I will need to create a new file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #22 there's an updated version of it now 🙂
src/pmhn/_ppl/_treemhn.py
Outdated
loglikelihoods = self._backend.loglikelihood_tree_list( | ||
trees=self._data, | ||
theta=theta_np, | ||
sampling_rate=self._mean_sampling_time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to systematize the naming here.
Hi Pawel, I found the problem. The reason was that the generated trees had sizes up to 35k! At first, I thought that running out of memory was due to the sampling process. However, it is because the trees generated were too large. Initially, I thought that the tree-generating process was buggy; however, the R version also produces such trees (with the MHN from the sample_spike_and_slab function). With small sampling rates and MHNs that have a lot of negative entries, smaller trees are generated, and the sampling works fine. Setting a maximum tree size wouldn't make sense, right? Because the likelihood calculation of a tree would change (as we discussed in our last meeting). |
Great you've found it! Indeed, the trees can be very large. It's always good to look at the some summary statistics of the data (e.g., the distribution of nodes or the number of subtrees).
Indeed, then the model would be misspecified and the inferences should yield biased results. Understanding this misspecification will be a useful thing to know (and it's partially addressed by experiments (b) and (c)).
MAP is generally different from the sample with the highest posterior (see e.g., https://discourse.mc-stan.org/t/the-typical-set-and-its-relevance-to-bayesian-computation/17174). The summary like mean may be meaningful when there's no multimodality. Generally, it'd be good to look at the posteriors of different entries (as I did in one of the presentation slides) and see whether this is an issue here (and if not, we can look at the means and standard errors).
Indeed! You already have the permissions for submitting jobs on several cores, but I hope you'll get also access to more cores 🙂 |
Hi Pawel, |
Hi Laurenz, as there are seven conflicting files now and |
Hi Pawel,
I created the file treemhn-bayes.smk which is very similar to the mhn-bayes.smk file. Running mhn-bayes.smk works perfectly, however when I run treemhn-bayes.smk, my system runs out of memory and the process is killed. I haven't found the error. Even with very small data sets it doesn't work. I've spent quite some time on debugging without success.
I also created the file _treemhn.py, where the TreeMHNLoglikelihood operation is defined. Additionally, you can find the loglikelihood_tree_list function in the _backend_geno.py file which calculates the likelihood of a list of trees. Could you please have a look at the code?