Bayes #23

laurenzkeller · 2023-10-25T22:14:49Z

Hi Pawel,

I created the file treemhn-bayes.smk which is very similar to the mhn-bayes.smk file. Running mhn-bayes.smk works perfectly, however when I run treemhn-bayes.smk, my system runs out of memory and the process is killed. I haven't found the error. Even with very small data sets it doesn't work. I've spent quite some time on debugging without success.
I also created the file _treemhn.py, where the TreeMHNLoglikelihood operation is defined. Additionally, you can find the loglikelihood_tree_list function in the _backend_geno.py file which calculates the likelihood of a list of trees. Could you please have a look at the code?

…& _simulate.py

…me now passed as an argument

PR approved?

all errors should be fixed now.

…tive paths, created modified_io, added 1e-10 to return value of likelihood function

… on _tree_utils_new.py

…ackend_geno, created _treemhn.py file which contains the TreeMHNLoglikelihood class

pawel-czyz

Hi Laurenz, great work! I'll take a closer look tomorrow or on Saturday – just one question: did you try running with e.g., only 2 samples or just the values in the Snakemake file? (Which are reasonable in terms of the collected sample size, but perhaps too computationally intense...)

pawel-czyz · 2023-10-26T18:15:45Z

tests/trees/test_likelihood.py

What's the reason for removing this file?

This file contained the tests for the oldest version of the likelihood implementation. I will need to create a new file.

In #22 there's an updated version of it now 🙂

pawel-czyz · 2023-10-26T18:17:22Z

src/pmhn/_ppl/_treemhn.py

+        loglikelihoods = self._backend.loglikelihood_tree_list(
+            trees=self._data,
+            theta=theta_np,
+            sampling_rate=self._mean_sampling_time,


We'll have to systematize the naming here.

laurenzkeller · 2023-10-26T21:30:15Z

Hi Pawel,

I found the problem. The reason was that the generated trees had sizes up to 35k! At first, I thought that running out of memory was due to the sampling process. However, it is because the trees generated were too large. Initially, I thought that the tree-generating process was buggy; however, the R version also produces such trees (with the MHN from the sample_spike_and_slab function). With small sampling rates and MHNs that have a lot of negative entries, smaller trees are generated, and the sampling works fine. Setting a maximum tree size wouldn't make sense, right? Because the likelihood calculation of a tree would change (as we discussed in our last meeting).
How should I proceed? I can now plot the posterior thetas (which still differ from the ground truth). Is there a smart way to obtain the MAP estimate from the samples, or should I just consider the theta that was sampled most?
It would be great if I could run the simulations on Euler.

pawel-czyz · 2023-10-27T20:55:16Z

I found the problem. The reason was that the generated trees had sizes up to 35k! At first, I thought that running out of memory was due to the sampling process. However, it is because the trees generated were too large.

Great you've found it! Indeed, the trees can be very large. It's always good to look at the some summary statistics of the data (e.g., the distribution of nodes or the number of subtrees).

Setting a maximum tree size wouldn't make sense, right? Because the likelihood calculation of a tree would change (as we discussed in our last meeting).

Indeed, then the model would be misspecified and the inferences should yield biased results. Understanding this misspecification will be a useful thing to know (and it's partially addressed by experiments (b) and (c)).

How should I proceed? I can now plot the posterior thetas (which still differ from the ground truth). Is there a smart way to obtain the MAP estimate from the samples, or should I just consider the theta that was sampled most?

MAP is generally different from the sample with the highest posterior (see e.g., https://discourse.mc-stan.org/t/the-typical-set-and-its-relevance-to-bayesian-computation/17174). The summary like mean may be meaningful when there's no multimodality. Generally, it'd be good to look at the posteriors of different entries (as I did in one of the presentation slides) and see whether this is an issue here (and if not, we can look at the means and standard errors).

It would be great if I could run the simulations on Euler.

Indeed! You already have the permissions for submitting jobs on several cores, but I hope you'll get also access to more cores 🙂
Submitting the jobs can be done by following the SLURM instructions listed here.

laurenzkeller · 2023-10-30T12:00:02Z

Hi Pawel,
I only noticed yesterday that the reason why the posterior theta plots looked so strange was that I passed an empty list of trees (I was really stupid) to the likelihood_tree_list function. Now I have the issue that the sampling process is really time-consuming. As a guest user on Euler, I have access to just 48 cores, which limits me to running simulations for a very small dataset (e.g 10 patients, across 8 chains). Already for 100 patients it takes forever.
Thanks for fixing the last PR.
See you in 3 hours 🙂

pawel-czyz · 2023-10-30T13:13:29Z

Hi Laurenz, as there are seven conflicting files now and *.png and *.csv files shouldn't be really stored in Git, it may be worth considering to close this PR and open a new one (which branches from the current main branch) with the changes are introduced again (mostly based on the code here).

laukeller and others added 30 commits September 22, 2023 14:55

Add files for listing children and subtrees

afa0462

added simulation of trees and comparison plots (not correct yet)

cc40f5e

removed plots and warmup directory, added plotting/csv related files …

6f51d22

…& _simulate.py

Merge branch 'main' into warmup

9d00b71

changed _simulate.py, write_csv.py and created a few unit tests

e2c3abd

Merge branch 'warmup' of github.com:cbg-ethz/pMHN into warmup

40c8a76

changed comment in _simulate.py

d154117

minor changes

39907ab

moved files to warmup dir

c4f96d3

remove not needed files

cb57917

change: draw new sampling time if tree is discarded, mean_sampling_ti…

e6ea6be

…me now passed as an argument

reformatted files with black

3131c24

changed unit tests

ea7cd53

reformat with black

a206230

Merge branch 'main' of github.com:cbg-ethz/pMHN

4bb2929

PR approved?

Merge branch 'warmup'

8773dec

PR approved?

Remove poetry.lock

34b0dc9

fixed ruff errors

9d6fc97

fixed ruff errors

41cd4c1

Merge branch 'warmup' of github.com:cbg-ethz/pMHN into warmup

bbe8ffc

Merge branch 'warmup'

c7596a0

all errors should be fixed now.

pyright fixed

05bb425

Merge branch 'warmup'

1c3bf80

modified _backend and added _tree_utils, created unit tests for both

ca76812

added files for likelihood comparison, changed absolute paths to rela…

d6cd286

…tive paths, created modified_io, added 1e-10 to return value of likelihood function

small change

2da22a5

added likelihood tests, modified test_tree_utils.py

6fedbdc

minor changes

01beb2f

minor change

f1cdb6b

implemented 2 new versions of the likelihood calculation which relies…

52eea92

… on _tree_utils_new.py

laukeller and others added 8 commits October 22, 2023 16:48

small change in _tree_utils.py (memoization not needed)

502f07f

small change

614719c

implemented pawel's suggestions

8377756

resolve conflicts

7aae54e

resolve conflicts

7218a49

Merge branch 'main' into likelihood_optimized

1974f3a

resolve conflicts

6d81945

created treemhn-bayes.smk, added method loglikelihood_tree_list to _b…

b527d51

…ackend_geno, created _treemhn.py file which contains the TreeMHNLoglikelihood class

laurenzkeller requested a review from pawel-czyz October 26, 2023 09:07

pawel-czyz reviewed Oct 26, 2023

View reviewed changes

generated a few plots, modified treemhn-bayes.smk

8be8851

laukeller added 5 commits October 28, 2023 11:43

add files

9157b19

add data

020da20

add data

0bce2e3

1 chain instead of

028a25a

deleted config.yaml

2fc2b8c

pawel-czyz mentioned this pull request Oct 30, 2023

Optimized implementation of the likelihood #22

Merged

laukeller added 8 commits October 30, 2023 14:28

plots for 10 patients, 10 mutations

12d24af

created summary statistics and trace plots

b3ecc0b

added plots

27605ae

added plots

80a824f

plots

bb8c382

modified treemhn_bayes_v3

c3928f7

add current version of bayes branch

ef9ce3b

small change

e4b1177

laurenzkeller closed this Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bayes #23

Bayes #23

laurenzkeller commented Oct 25, 2023

pawel-czyz left a comment

pawel-czyz Oct 26, 2023

laurenzkeller Oct 26, 2023

pawel-czyz Oct 30, 2023

pawel-czyz Oct 26, 2023

laurenzkeller commented Oct 26, 2023

pawel-czyz commented Oct 27, 2023 •

edited

Loading

laurenzkeller commented Oct 30, 2023

pawel-czyz commented Oct 30, 2023

Bayes #23

Bayes #23

Conversation

laurenzkeller commented Oct 25, 2023

pawel-czyz left a comment

Choose a reason for hiding this comment

pawel-czyz Oct 26, 2023

Choose a reason for hiding this comment

laurenzkeller Oct 26, 2023

Choose a reason for hiding this comment

pawel-czyz Oct 30, 2023

Choose a reason for hiding this comment

pawel-czyz Oct 26, 2023

Choose a reason for hiding this comment

laurenzkeller commented Oct 26, 2023

pawel-czyz commented Oct 27, 2023 • edited Loading

laurenzkeller commented Oct 30, 2023

pawel-czyz commented Oct 30, 2023

pawel-czyz commented Oct 27, 2023 •

edited

Loading