-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized implementation of the likelihood #22
Conversation
…me now passed as an argument
PR approved?
PR approved?
all errors should be fixed now.
…tive paths, created modified_io, added 1e-10 to return value of likelihood function
… on _tree_utils_new.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Laurenz,
Very nice improvements! To provide some constructive criticism, I think it'd be good to unify the code – there are several backends, and although they are very nice, I think in the long-term we won't be able to maintain them.
Probably the best way forward would be to just select one of them (e.g., the one that has good speed and the cleanest code) and remove the others (and the warmup
scripts used to test them).
Perhaps the cleanest option (in terms of Git-jitsu) to do this is to open a new branch from this one, do the changes, and then merge that branch into this one, which will in turn later be merged into main
. Some other general tips on maintaining Git history can be found there.
Hi Pawel, I attempted to implement Xiang's idea. However, I'm not sure if I did it exactly the way you wanted. Regardless, the runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds. There seems to be a consistent overhead of 1.7 seconds when running the "R_py_loglikelihood_comparison" files. This overhead stems from the Unix time command, which measures not only the direct runtime of the program but also other operations. Additionally, the time measurement for CSV-Numpy conversions, tree-parsing etc. contribute to this overhead. Therefore, my first implementation of the likelihood had a runtime of slightly more than 10 seconds (not 12). The _backend.py and _tree_utils.py files contain the implementation of the likelihood from the last commit, while the _backend_geno.py and _tree_utils_geno.py files contain the updated version. I also started working on the report and I played around with the examples you recommended on Bayesian statistics. However, I haven't quite finished Michael Betancourt’s "Inferring gravity from data". See you on Tuesday and sorry for the late reply! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Looks very nice to me 🙂
Hi Laurenz! Great work! To respond to the individual items:
0.8 seconds is truly amazing!
Very nice 🙂
Wonderful! Looking forward to hearing what you think 🙂
Don't worry! It's a long exercises and I didn't expect you'd finish it in just a week!
Please, please, remember to have nice rest and not work on weekends. The thesis is a marathon not a sprint, as my older PhD friends tell me 😉 Great work and see you tomorrow! |
Hi Laurenz, I think it'd be good to merge this branch soon (otherwise the merge conflicts with #23 may get even harder to resolve). I thought I could be helpful here, so I:
I wonder if you could take a look at these changes. If they look fine, please merge 🙂 |
I attempted to implement Xiang's indexing idea.
The runtime for calculating the likelihood of 623 trees is now at approximately 0.8 seconds (the first implementation of the likelihood had a runtime of slightly more than 10 seconds).
The
_backend.py
and_tree_utils.py
files contain the implementation of the likelihood from the last commit, while the_backend_geno.py
and_tree_utils_geno.py
files contain the updated version.