Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized implementation of the likelihood #22

Merged
merged 53 commits into from
Oct 30, 2023
Merged

Conversation

laurenzkeller
Copy link
Collaborator

@laurenzkeller laurenzkeller commented Oct 13, 2023

I attempted to implement Xiang's indexing idea.

The runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds (the first implementation of the likelihood had a runtime of slightly more than 10 seconds).

The _backend.py and _tree_utils.py files contain the implementation of the likelihood from the last commit, while the _backend_geno.py and _tree_utils_geno.py files contain the updated version.

laukeller and others added 30 commits September 22, 2023 14:55
all errors should be fixed now.
…tive paths, created modified_io, added 1e-10 to return value of likelihood function
Copy link
Member

@pawel-czyz pawel-czyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Laurenz,

Very nice improvements! To provide some constructive criticism, I think it'd be good to unify the code – there are several backends, and although they are very nice, I think in the long-term we won't be able to maintain them.

Probably the best way forward would be to just select one of them (e.g., the one that has good speed and the cleanest code) and remove the others (and the warmup scripts used to test them).

Perhaps the cleanest option (in terms of Git-jitsu) to do this is to open a new branch from this one, do the changes, and then merge that branch into this one, which will in turn later be merged into main. Some other general tips on maintaining Git history can be found there.

src/pmhn/_trees/_backend.py Show resolved Hide resolved
src/pmhn/_trees/_tree_utils.py Outdated Show resolved Hide resolved
src/pmhn/_trees/_tree_utils_new.py Outdated Show resolved Hide resolved
@laurenzkeller
Copy link
Collaborator Author

Hi Pawel,

I attempted to implement Xiang's idea. However, I'm not sure if I did it exactly the way you wanted. Regardless, the runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds. There seems to be a consistent overhead of 1.7 seconds when running the "R_py_loglikelihood_comparison" files. This overhead stems from the Unix time command, which measures not only the direct runtime of the program but also other operations. Additionally, the time measurement for CSV-Numpy conversions, tree-parsing etc. contribute to this overhead. Therefore, my first implementation of the likelihood had a runtime of slightly more than 10 seconds (not 12).

The _backend.py and _tree_utils.py files contain the implementation of the likelihood from the last commit, while the _backend_geno.py and _tree_utils_geno.py files contain the updated version.

I also started working on the report and I played around with the examples you recommended on Bayesian statistics. However, I haven't quite finished Michael Betancourt’s "Inferring gravity from data".

See you on Tuesday and sorry for the late reply!

Copy link
Member

@pawel-czyz pawel-czyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Looks very nice to me 🙂

src/pmhn/_trees/_backend.py Outdated Show resolved Hide resolved
src/pmhn/_trees/_backend.py Outdated Show resolved Hide resolved
src/pmhn/_trees/_backend_geno.py Outdated Show resolved Hide resolved
src/pmhn/_trees/_backend_geno.py Outdated Show resolved Hide resolved
@pawel-czyz
Copy link
Member

Hi Laurenz! Great work! To respond to the individual items:

I attempted to implement Xiang's idea. However, I'm not sure if I did it exactly the way you wanted. Regardless, the runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds.

0.8 seconds is truly amazing!

The _backend.py and _tree_utils.py files contain the implementation of the likelihood from the last commit, while the _backend_geno.py and _tree_utils_geno.py files contain the updated version.

Very nice 🙂

I also started working on the report and I played around with the examples you recommended on Bayesian statistics.

Wonderful! Looking forward to hearing what you think 🙂

However, I haven't quite finished Michael Betancourt’s "Inferring gravity from data".

Don't worry! It's a long exercises and I didn't expect you'd finish it in just a week!

See you on Tuesday and sorry for the late reply!

Please, please, remember to have nice rest and not work on weekends. The thesis is a marathon not a sprint, as my older PhD friends tell me 😉

Great work and see you tomorrow!

@pawel-czyz
Copy link
Member

Hi Laurenz,

I think it'd be good to merge this branch soon (otherwise the merge conflicts with #23 may get even harder to resolve). I thought I could be helpful here, so I:

  • Unified the code (removing redundant classes and plugging both of your great implementations into the unit tests framework).
  • Resolved the status checks which didn't pass.

I wonder if you could take a look at these changes. If they look fine, please merge 🙂

@pawel-czyz pawel-czyz mentioned this pull request Oct 30, 2023
@laurenzkeller laurenzkeller merged commit 9b0b2e2 into main Oct 30, 2023
1 check passed
@laurenzkeller laurenzkeller deleted the likelihood_optimized branch October 30, 2023 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants