-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Likelihood #21
Likelihood #21
Conversation
…me now passed as an argument
PR approved?
PR approved?
all errors should be fixed now.
…tive paths, created modified_io, added 1e-10 to return value of likelihood function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Laurenz,
Truly exceptional work! I like it very much. I left some minor comments and a single important comment (regarding _io_modified.py
).
I think this implementation is very nice and easy to follow. I think it'll be the model implementation against the future implementations can be compared with.
For the next (faster) implementation I'd consider these improvements:
- Currently the complexity is
$O(n_T^2)$ , where$n_T$ is the number of subtrees of tree$T$ and it's executed usingfor
loops, which are known to be slow in Python. Using a BFS approach you don't really need to use these nested loops to construct theQ
matrix. This change alone is likely to speed up the implementation by a large factor. - Currently the list of subtrees of a given tree has to be calculated at every execution of the loglikelihood calculation, as the signature of the function is
loglikelihood(tree, theta, sampling_time)
.
I think that in the future implementation it may be good to do caching:
class LoglikelihoodSingleTree:
def __init__(self, tree):
self._subtrees_list = calculate_subtrees(tree)
def loglikelihood(self, theta, sampling_time):
... # Note that you have the list of subtrees already calculated
- Using BFS + dynamic programming + forward substitution you won't need to construct the
$V$ matrix explicitly, if I recall. The trick is that you can calculate the non-zero entries explicitly and solve the linear equation for a single new term in thex
vector.
Hi Pawel, |
The trick here is that we will calculate the likelihood of the whole data set (i.e., for each tree) many times. Think about it this way: if you have the data set However, to understand which Another suggestion, which I forgot to mention, is that keeping subtrees as a list may be suboptimal if you implement the calculation via BFS (whether you create the |
Thanks for the clarification |
Hi Pawel, I first tried to keep the matrix construction approach and I tried to loop only once over the subtrees list and construct a single row Furthermore, I'm not sure, but I think that using the BFS approach you suggested, which constructs an entire row of the V matrix, doesn't allow for forward substitution because that would require each column of the V matrix to be constructed in one go (so exit nodes of different trees have to be considered). You can view the two versions (4.7 and 4.9 seconds) on Github. I did not push the version that had a runtime of more than 2 minutes. |
Great work, thanks a lot for the update!
Great! I think it'll be around 1 second then for 120 trees. Doing 10,000 steps will be therefore around 3 hours on Euler, I think it's performant enough for our purposes 🙂 (Plus, we could resort to multiprocessing and probably reduce the runtime by a factor of 4x, if needed).
This is interesting and quite a surprising thing to me. Let's discuss it on Tuesday, I'd love to understand that better!
I may have overlooked it – another point to discuss 🙂 |
I think the best way to proceed will be to separate the 30th commit (which introduces the new backends) into a new PR, so only the first 29 commits are in this one. (And after the minor comments are resolved, they are merged. For the 30th commit I'd like to provide some additional feedback). Do you know how to split a PR into two, or should I do it for you? |
52eea92
to
f1cdb6b
Compare
Hi Pawel, |
I implemented your suggestions. Should I resolve the comments and merge this branch into the main branch? |
Sounds great! Note that there's a minor conflict in one file. It looks that it can be fixed using the online editor (it's like 3-line modification), but if you need any help, let me know! |
Hi Pawel,
I implemented the log-likelihood function and created some unit tests for it. However, I was struggling with comparing R with Python because there doesn't seem to be a corresponding R function that accepts the same inputs (tree, theta matrix and sampling rate) and returns the log-likelihood. I might need to ask Xiang.