Optimized implementation of the likelihood #22

laurenzkeller · 2023-10-13T17:55:50Z

I attempted to implement Xiang's indexing idea.

The runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds (the first implementation of the likelihood had a runtime of slightly more than 10 seconds).

The _backend.py and _tree_utils.py files contain the implementation of the likelihood from the last commit, while the _backend_geno.py and _tree_utils_geno.py files contain the updated version.

…& _simulate.py

…me now passed as an argument

PR approved?

all errors should be fixed now.

…tive paths, created modified_io, added 1e-10 to return value of likelihood function

… on _tree_utils_new.py

pawel-czyz

Hi Laurenz,

Very nice improvements! To provide some constructive criticism, I think it'd be good to unify the code – there are several backends, and although they are very nice, I think in the long-term we won't be able to maintain them.

Probably the best way forward would be to just select one of them (e.g., the one that has good speed and the cleanest code) and remove the others (and the warmup scripts used to test them).

Perhaps the cleanest option (in terms of Git-jitsu) to do this is to open a new branch from this one, do the changes, and then merge that branch into this one, which will in turn later be merged into main. Some other general tips on maintaining Git history can be found there.

src/pmhn/_trees/_backend.py

src/pmhn/_trees/_tree_utils.py

src/pmhn/_trees/_tree_utils_new.py

warmup/likelihood/R_py_loglikelihood_comparison.py

laurenzkeller · 2023-10-22T20:10:47Z

Hi Pawel,

I attempted to implement Xiang's idea. However, I'm not sure if I did it exactly the way you wanted. Regardless, the runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds. There seems to be a consistent overhead of 1.7 seconds when running the "R_py_loglikelihood_comparison" files. This overhead stems from the Unix time command, which measures not only the direct runtime of the program but also other operations. Additionally, the time measurement for CSV-Numpy conversions, tree-parsing etc. contribute to this overhead. Therefore, my first implementation of the likelihood had a runtime of slightly more than 10 seconds (not 12).

The _backend.py and _tree_utils.py files contain the implementation of the likelihood from the last commit, while the _backend_geno.py and _tree_utils_geno.py files contain the updated version.

I also started working on the report and I played around with the examples you recommended on Bayesian statistics. However, I haven't quite finished Michael Betancourt’s "Inferring gravity from data".

See you on Tuesday and sorry for the late reply!

pawel-czyz

Great work! Looks very nice to me 🙂

src/pmhn/_trees/_backend.py

src/pmhn/_trees/_backend_geno.py

pawel-czyz · 2023-10-23T11:55:31Z

Hi Laurenz! Great work! To respond to the individual items:

I attempted to implement Xiang's idea. However, I'm not sure if I did it exactly the way you wanted. Regardless, the runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds.

0.8 seconds is truly amazing!

The _backend.py and _tree_utils.py files contain the implementation of the likelihood from the last commit, while the _backend_geno.py and _tree_utils_geno.py files contain the updated version.

Very nice 🙂

I also started working on the report and I played around with the examples you recommended on Bayesian statistics.

Wonderful! Looking forward to hearing what you think 🙂

However, I haven't quite finished Michael Betancourt’s "Inferring gravity from data".

Don't worry! It's a long exercises and I didn't expect you'd finish it in just a week!

See you on Tuesday and sorry for the late reply!

Please, please, remember to have nice rest and not work on weekends. The thesis is a marathon not a sprint, as my older PhD friends tell me 😉

Great work and see you tomorrow!

pawel-czyz · 2023-10-30T11:29:15Z

Hi Laurenz,

I think it'd be good to merge this branch soon (otherwise the merge conflicts with #23 may get even harder to resolve). I thought I could be helpful here, so I:

Unified the code (removing redundant classes and plugging both of your great implementations into the unit tests framework).
Resolved the status checks which didn't pass.

I wonder if you could take a look at these changes. If they look fine, please merge 🙂

laukeller and others added 30 commits September 22, 2023 14:55

Add files for listing children and subtrees

afa0462

added simulation of trees and comparison plots (not correct yet)

cc40f5e

removed plots and warmup directory, added plotting/csv related files …

6f51d22

…& _simulate.py

Merge branch 'main' into warmup

9d00b71

changed _simulate.py, write_csv.py and created a few unit tests

e2c3abd

Merge branch 'warmup' of github.com:cbg-ethz/pMHN into warmup

40c8a76

changed comment in _simulate.py

d154117

minor changes

39907ab

moved files to warmup dir

c4f96d3

remove not needed files

cb57917

change: draw new sampling time if tree is discarded, mean_sampling_ti…

e6ea6be

…me now passed as an argument

reformatted files with black

3131c24

changed unit tests

ea7cd53

reformat with black

a206230

Merge branch 'main' of github.com:cbg-ethz/pMHN

4bb2929

PR approved?

Merge branch 'warmup'

8773dec

PR approved?

Remove poetry.lock

34b0dc9

fixed ruff errors

9d6fc97

fixed ruff errors

41cd4c1

Merge branch 'warmup' of github.com:cbg-ethz/pMHN into warmup

bbe8ffc

Merge branch 'warmup'

c7596a0

all errors should be fixed now.

pyright fixed

05bb425

Merge branch 'warmup'

1c3bf80

modified _backend and added _tree_utils, created unit tests for both

ca76812

added files for likelihood comparison, changed absolute paths to rela…

d6cd286

…tive paths, created modified_io, added 1e-10 to return value of likelihood function

small change

2da22a5

added likelihood tests, modified test_tree_utils.py

6fedbdc

minor changes

01beb2f

minor change

f1cdb6b

implemented 2 new versions of the likelihood calculation which relies…

52eea92

… on _tree_utils_new.py

laukeller added 7 commits October 16, 2023 16:42

minor changes

42ccc90

small change

3479452

small change

6f36518

small change

64bdc6a

small change

6dc78fb

small change

f490594

changed io file and unit tests for it

6d492a6

pawel-czyz reviewed Oct 18, 2023

View reviewed changes

src/pmhn/_trees/_backend.py Show resolved Hide resolved

src/pmhn/_trees/_tree_utils.py Outdated Show resolved Hide resolved

src/pmhn/_trees/_tree_utils_new.py Outdated Show resolved Hide resolved

warmup/likelihood/R_py_loglikelihood_comparison.py Show resolved Hide resolved

laukeller added 8 commits October 20, 2023 17:20

genotype

a6c4836

small change

20a2f1e

small change

b6b97ee

small change in _tree_utils_geno.py

f96265a

small change in diag_entry

76aeba2

small change in diag_entry function

18769d4

small change in _tree_utils.py (memoization not needed)

502f07f

small change

614719c

pawel-czyz approved these changes Oct 23, 2023

View reviewed changes

src/pmhn/_trees/_backend.py Outdated Show resolved Hide resolved

src/pmhn/_trees/_backend.py Outdated Show resolved Hide resolved

src/pmhn/_trees/_backend_geno.py Outdated Show resolved Hide resolved

src/pmhn/_trees/_backend_geno.py Outdated Show resolved Hide resolved

laukeller and others added 7 commits October 23, 2023 21:05

implemented pawel's suggestions

8377756

Merge branch 'main' into likelihood_optimized

1974f3a

Apply Black formatter.

25a6f75

Remove redundant code from original backend

955de94

Fix unit tests

71ca142

Update the warmup files.

263235a

Ignore Pyright false positive

04a27ef

pawel-czyz mentioned this pull request Oct 30, 2023

Bayes #23

Closed

laurenzkeller merged commit 9b0b2e2 into main Oct 30, 2023
1 check passed

laurenzkeller deleted the likelihood_optimized branch October 30, 2023 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized implementation of the likelihood #22

Optimized implementation of the likelihood #22

laurenzkeller commented Oct 13, 2023 •

edited by pawel-czyz

Loading

pawel-czyz left a comment

laurenzkeller commented Oct 22, 2023

pawel-czyz left a comment

pawel-czyz commented Oct 23, 2023

pawel-czyz commented Oct 30, 2023

Optimized implementation of the likelihood #22

Optimized implementation of the likelihood #22

Conversation

laurenzkeller commented Oct 13, 2023 • edited by pawel-czyz Loading

pawel-czyz left a comment

Choose a reason for hiding this comment

laurenzkeller commented Oct 22, 2023

pawel-czyz left a comment

Choose a reason for hiding this comment

pawel-czyz commented Oct 23, 2023

pawel-czyz commented Oct 30, 2023

laurenzkeller commented Oct 13, 2023 •

edited by pawel-czyz

Loading